Closed SrirachaHorse closed 1 month ago
@tefra I've considered a few solutions to this problem, so I would like to discuss a possible fix so that I can contribute. A few ideas I've thought of:
ruff_code()
method run Ruff with smaller batches of filenames, rather than trying to run on every file at once (run ruff on the first 50 files, then the next 50, etc.)DataclassGenerator.render()
method to instead build the file_names
list from module names, rather than from individual files. This could be done by appending (e.g.) package_path.parent
to the list instead of package_path
filename. Ruff will run formatting/checking recursively when given a directory, rather than needing to run from a large list of files.
One step forward, two steps back 🤦
The reason I went with the specific filenames is because people often, they add user stuff in the output folder, which isn't a great practise in my opinion.
The first option with the batches could work for everyone, but it would be tricky to test. I am wondering if we should go the hard way and run ruff on the entire output package, add a warning in the docs to prohibit people from adding code in the output package and be done with it.
That's my current inclination.
That sounds good to me @tefra, I think running on the whole output package is my preferred solution too. I'll work on a fix.
For sufficiently large XSD schemas that are split into multiple schemas or documents, this often produces a large number of corresponding dataclass files, in particular if using configuration options such as
namespace-clusters
. In such cases,DataclassGenerator.ruff_code()
will be called on all of these files at once.On Windows there is a character limit on CLI commands of approximately 32k. If this limit is exceeded by attempting to pass a large number of files to Ruff via
subprocess.run()
, this will produce a Windows error and fail when generating dataclasses:None of the input filenames individually exceed the Windows MAX_PATH limit, so this is an issue with the number of files attempting to be processed at once, rather than an issue with a particular file.
The regression appears to be a result of #1043, as that was the change that introduced the "all files at once" approach instead of running Ruff on each file separately.