Data outputs should be optional

mfripp commented 6 years ago

Various standard modules now have post_solve() functions with built-in report-writing behavior. There are no command-line options to control this, so users have no easy way to turn this off.

This can be a problem when running iterative models on an HPC system, e.g., I am currently running a model that solves for 2 years of hourly data in each of 6 study periods, which makes about 1 GB of output per iteration. None of that is needed, because I only use a few kB of diagnostic statistics each iteration, which are written by a different module. But all this output burdens the HPC's network file system and could slow down the iterations -- they only take a couple of minutes each when running a lot of solutions in parallel, but may take much longer when the file system is backlogged.

I am able to turn off a lot of output by leaving out switch_model.reporting, but it's not so easy to turn off the post_solve() functions in the standard modules. I am currently doing that from one of my custom modules via monkey-patching, as follows:

# suppress standard reporting to minimize disk access (ugh)
from importlib import import_module
for module in [
    'balancing.load_zones', 'generators.core.build', 'generators.core.dispatch',
    'generators.extensions.storage'
]:
    imported_module = import_module('switch_model.' + module)
    del imported_module.post_solve

But this is not a good long-term solution.

I would recommend moving these standard outputs into the reporting module, and having it 'duck type' the outputs, i.e., only generate outputs for components that are present in the model, and not worry about the others. Then these outputs can be suppressed simply by omitting the reporting module. This would also move reporting to a higher level ("report this element if present, regardless of what module created it"), which would make the output more standardized (roughly the same outputs whether you use the standard modules or some alternative replacements) and avoid repetition of reporting code in alternative models.

I would also recommend adding some command-line flags to control the level of reporting -- per-variable outputs, per-expression outputs, only certain variables and expressions, horizontal-table output or only certain horizontal-table output.

josiahjohnston commented 5 years ago

I like the idea of a flag to make switch_model.reporting.save_generic_results() optional, and possibly moving those variable dumps into a subdirectory. As the export process is getting more fleshed out, the raw dumps are getting less useful and cluttering up the outputs directory. Excluding switch_model.reporting from modules.txt might work, but I haven't tested it for unintended consequences.

I like the goal of having a clean way of disabling module export code - especially for your use case. I'd prefer to have a few options to control which modules run their post_solve/export routines rather than moving module-specific export code into separate files. Something like: --only-export-modules module1,module2,... and --skip-export-modules moduleA,moduleB,....

Re: use cases of closely related modules that have components with the same names and indexes, but different formulas Two patterns come to mind. 1) Merge them into a single module and have a command-line switch to flip between different formulations. Kind of like how spinning_reserves transitions between different rules for provisioning requirements 2) Move the shared reporting code into a separate module, and add documentation that it is a post-requisite.

The first seems a cleaner for stable production code, but possibly more cumbersome during rapid prototyping.

mfripp commented 4 years ago

At some point I added a --no-post-solve flag that prevents all post-solve code as discussed here. Between that and #127 we've probably gone far enough, so I'll close this issue.

switch-model / switch

Data outputs should be optional #104