sparcians / map

Modeling Architectural Platform
Apache License 2.0
167 stars 59 forks source link

Option to produce a simdb file w/o a report dest_file #94

Open MichaelSchoenfelder-SiV opened 4 years ago

MichaelSchoenfelder-SiV commented 4 years ago

I believe one intention of a simdb file (sqlite or hdf5) was to be a repository of data from which a traditional (json, csv, html, txt) report could be generation. I can't tell if we can effect the generation of a simdb without asking for a traditional report.

Terminology: Sparta is lacking a name for the yaml that describes a report. Some of the documentation calls it a "report definition" file, but that more properly refers to the file that describes what stats are to be included in the report. The report that describes the output file name and triggering mechanisms needs a name. I propose "report configuration file". One parameter in the report configuration file is the def_file, which describes the contents of the report. I assume "def_file" stands for "definition file', so we shouldn't overload that term. Hence "configuration file" (after typing up this paragraph I see that the sparta help uses the term "description file". Perhaps that is the same concept as what I describe here. I like "configuration" better than "description", but as long as it is not "definition", I will be happy.)

The configuration file has parameters 'format' and 'dest_file'. I believe the format has some significance in that 'csv' may imply time series, although the time-seriesness maybe is implied by the update triggers. Those are just details. I believe 'dest_file' is used as a handle (key) for the data in the simdb file.

The ask: provide a mechanism for the report configuration file to cause a simdb file to be written, but not write the actual target reports.

The reason is that we don't want duplicate data on the disk. If post-processing tools are going to slurp data from the simdb file, then we don't need the text-based files for production.

However, the human readable text-based files are useful for human consumption during development and debugging and also for feeding data to statistical analysis tools (that won't be able to grok the sqlite format). We need an easy way turn on text-based output. I think there is a sparta "feature" for legacy reports, but I believe that generates reports the old fashioned way and doesn't use simdb as an intermediary. That might be OK, but I think it means supporting two different code paths. We may want to eventually deprecate the legacy report generation. Another option is the write a set of tools that will generate the desired reports from the simdb files. However, if we want to do that for an entire study, then the orchestration tools would have to know how to invoke hundreds of post-processing commands. Ideally it would be yet another flag. It could be an option in the report configuration/description file, but it would need to be a boolean so that the study flow could keyword substitute a value.

Note: The post-processing tools will need the handle to the data in simdb file. However, the study orchestration tools will be able to supply that using the yaml keyword replacements in --report-yaml-replacements or in the report options (see sparta help).

ghost commented 4 years ago

Couple of things I'll fix:

  1. I've updated the documentation and the command line option to say it's a configuration file and not a definition file. Changing the source code internally -- that's a bit of an undertaking and I think I'll leave it be
  2. Update triggers are definitely associated with time-series reporting. Not sure what happens if you provide that with a standard JSON format or HTML. These issues are best addressed with documentation, which I have, but need to put into a readable format
  3. Need to add the format 'simdb' to the configuration file and remove the feature concept. I think it's good to go.
  4. I'll look into providing database "dumping" tools that can convert the simdb to JSON, HTML, CSV, etc.