Improve consistency of model output files

rjplevin commented 1 month ago

In adding an output for total energy output, I noticed that several of the output files were incorrect when running in MCS mode. Namely, they didn't include the trial number in the output data so data for a field were repeated without distinguishing by trial. I corrected this in a few of the files.

I also noticed that the files are pretty inconsistent in how they represent data and units:

in carbon_intensity.csv, the units are included with the values, e.g., "9.064797952522019 gram / MJ"
in streams.csv and the new file energy_output.csv, there is a separate "units" column
in energy.csv [to be renamed energy_use.csv] and emissions.csv, the units are part of the column name, e.g., "Field 1 (metric_ton / day)", columns represent fields, and trial numbers are missing
in gases.csv, units appear as a second header row beneath the gas category names

They also differ as to whether process or field names are column headers or data. Most of the files are in "long" format, which is more verbose, but more flexible.

I propose to modify output files as follows:

Convert remaining output files to long format (primarily an issue for energy.csv and emissions.csv), so data (process or field names) are not used as column names.
Have a "units" column in all cases.
Include trial numbers when running in MCS mode

I've made several of these changes in the branch I'm working on ("add-energy-output") but wanted to verify that these changes are acceptable and not going to cause much trouble downstream (particularly, converting energy and emissions data to long format.)

lschmeisser-rmi commented 1 month ago

@rjplevin Proposed modifications #1 and #2 above look great. I approve converting output files to long format and having a units column in all cases.

As for the trial numbers in MCS mode, I don't know if any users will need actual outputs from each realization. More important I think are the statistics (mean, median, percentiles) of the variables if run in MCS mode. If we think there is a use case for seeing all trial outputs when run in MCS mode, then I think including trial numbers of course makes sense. But I think at least having an option to only see the statistics from each MCS run is much more useful.

rjplevin commented 1 month ago

The output files were already being generated for MCS, it was just that some were missing trial numbers. I see no reason to prevent their generation, which happens only if --results detailed is specified.

For the statistics, I assume you mean just for overall carbon intensity at the designated system boundary? Yes, we can produce statistics for that, or for whatever else you think is useful (total emissions? total produced energy?) @lschmeisser-rmi

lschmeisser-rmi commented 1 month ago

@rjplevin Great, if the trial output was already generated, I agree there's no reason to stop that if the work is already done! So in that case, agreed to add trial numbers.

And yes, if we can get statistics at the designated system boundary for all trials for variables of interest - CI, gases- that would be great. This is important for us because it helps us quantify uncertainty of the model estimates.

msmasnadi / OPGEEv4

Improve consistency of model output files #21