metawards / MetaWards

MetaWards disease metapopulation analysis and modelling software. Professional geographical SIR model with a flexible plugin architecture to support complex scenario modelling
https://metawards.org
GNU General Public License v3.0
13 stars 6 forks source link

Path size limits breached with text fingerprints #125

Closed fentonscode closed 4 years ago

fentonscode commented 4 years ago

Description For large input parameter dimensions, the path size created by the OutputFiles object for each run can breach the OS / file system size limits - preventing runs. I do not have a current workaround as it appears MetaWards automatically creates the directories before giving control to the plugin modules to prevent it. Ideally, I would want to turn this off completely and index with a UUID or sequential run index which then has parameters listed in a separate log (or scraped from the console stdout). Alternatively, as I have a database server solution I want to avoid touching the local file system aside from the main console logs if possible.

To Reproduce Create scanning.csv file with 20 columns, each with at least 13 decimal places of rounding, then run a 'standard' MetaWards progression with default parameters and extractors.

Error from a 20d example (with much lower rounding precision):

FAILED:  [Errno 22] The filename, directory name, or volume label syntax is incorrect: 'C:\\output\\46i0v0i9462315970481183v0i16613424770388374v0i5381536455v0i16613424770388374v0i22411260716145964v0i724290
7355v0i2431532653066649v0i08458675876493316v0i595872888v0i22924741957637923v0i1048347928916109v0i83998416v0i956333655v0
i95818368v0i827222239v0i612153042v0i8364329675v7500i0x001'

Environment:

chryswoods commented 4 years ago

The directory name is used by metawards-plot to decode the values used in a scan. It is also very convenient when using file operations to extract subsets, e.g. metawards-plot -i output/*x001/trajectory.csv.bz2, so I will still keep this as the default option (albeit with a check to see if the name is too long, and if it is, then raising an appropriate error).

There is already a way for you to avoid this problem, e.g. setting the output directory name directly in the design file via the output column (https://metawards.org/fileformats/design.html#special-columns). You could simply generate your own uuid, run number or parameter for this column when you generate your design file and MetaWards would use that instead of the fingerprint.

However I do see that this would be straightforward to add as a command line argument option, e.g. —outdir-scheme, with options “fingerprint” for current behaviour, “sequential” for counting up from 1, or “uid” for auto-generating a UID. I’ll see what I can do next week.

MetaWards does need this directory, even when using a database, as it needs to have somewhere to place the stdout for the job. This is set at a high level in the code (creating the OutputFiles object to pass to a Network.run) and there is no way for the code at this point to know whether a custom user extractor plugin needs this directory or not.

fentonscode commented 4 years ago

Just to add: With the "output" column I get inconsistent naming as the "x001" postfix is never written

chryswoods commented 4 years ago

I've added in the code to resolve everything in this issue (in the attached pull request).

You can now use "--outdir-format" to set the directory naming format to "fingerprint" (default), "sequential" or "uid". I've also now made the data for the adjustment available to your extractor as the actual VariableSet object, via network.params.adjustments. This is a list of all of the adjustments that have been made to the parameters. In your case, there will be just a single adjustment, so a single value.

Thus network.params.adjustments[0] will contain a VariableSet, from which you can use the var.output_dir() to get the local name of the output directory (so just the sequential number of UID if those were used). You can also call var.repeat_index() and var.variables() to get the repeat index and individual adjusted variables.

This will be available in devel once I merge the attached pull request (which will close this issue). Please feel free to reopen if this doesn't do what you want.

Also, I've fixed the x001 problem too ;-)