iiasa / message-ix-models

Tools for the MESSAGEix-GLOBIOM family of models
https://docs.messageix.org/models
Apache License 2.0
17 stars 33 forks source link

Revert MESSAGEix-Materials output directory change #241

Open macflo8 opened 2 weeks ago

macflo8 commented 2 weeks ago

I sense this change has been made because we haven't provided good enough documentation and examples on how to ensure output files are placed in a desired location.

In general, code should only be outputting to directories within the package itself if it's intended that the generated files will be committed to the repo. Otherwise, the user should choose and be able to configure a local directory for output files, or the code should pick a sensible default.

Let's not hold up this PR to revert, but have a conversation to address the usability issue and then revert later.

_Originally posted by @khaeru in https://github.com/iiasa/message-ix-models/pull/218#discussion_r1799403807_

khaeru commented 1 week ago

@macflo8 if you could provide some description of how you currently arrange input and output files for materials workflows, and why (e.g. what does that allow you to do, or how does it make your work easier), then we can think about alternatives that both meet those needs and keep data and code separate.

macflo8 commented 1 week ago

All input data is stored in data/material except for proprietary data (e.g. IEA EWEB), which can be set for each MESSAGEix-Materials with the --iea_data_path click option of the material-ix build command. We also store "not-purely materials related" files there as well at the moment, but which are usually needed for the default MESSAGEix-Materials workflow. (e.g. MACRO calibration input in material/macro, MESSAGEix-GLOBIOM constraint calibration input in material/UE_dynamic_constraints, other miscellaneous files in material/other)

The material-ix report command runs the Materials specific reporting in model/material/report/reporting.py in conjunction with the legacy reporting. Both create an .xlsx file at the end respectively. Since the output is also uploaded to the corresponding scenario instance (via scenario.add_timeseries()), the .xlsx output of the legacy reporting contains these time series as well. The .xlsx output of model/material/report/reporting.py is thus a subset of the legacy output .xlsx.

Maybe we can add click options to material-ix report to control, whether the results should be printed to an .xlsx file and where to store it?

khaeru commented 5 days ago

Great, thanks. A couple follow-up questions before any suggestions:

Proprietary data (e.g. IEA EWEB), which can be set for each MESSAGEix-Materials with the --iea_data_path click option of the material-ix build command.

And where exactly is this data kept, usually? What formats are they in—i.e. are they the original files from the IEA website? In the documentation I don't see any others (or even --iea_data_path) mentioned.

We also store "not-purely materials related" files there as well at the moment, but which are usually needed for the default MESSAGEix-Materials workflow. (e.g. MACRO calibration input…

This is good practice 👍🏾

Both create an .xlsx file at the end respectively.

Where are each created? Is the need to have both/all in a similar/same location?

macflo8 commented 5 days ago

And where exactly is this data kept, usually? What formats are they in—i.e. are they the original files from the IEA website? In the documentation I don't see any others (or even --iea_data_path) mentioned.

Ah good catch, I see that we are missing a few click options in the documentation of the commands. That is on me. At the moment the option's default value is set to a shared IIASA drive path. So all users in the internal IIASA network do not need to specify this if they are happy with using that version of the IEA EWEB data. But more generally, users also have the option to use the "old" calibration workflow, which does not require the IEA EWEB file, but is less accurate and uses outdated timeseries. This is currently controlled with the --old_calib boolean option, which is also missing from documentation.

Both create an .xlsx file at the end respectively.

Where are each created? Is the need to have both/all in a similar/same location?

The material-ix report command calls the legacy reporting without specifying the out_dir argument, so it is stored in data/report/legacy/reporting_output. The directory for the material specific reporting has been set data/material/reporting_output in #218 , which was the entry point to this conversation.

Hope this helps. Happy to receive suggestions for improvements.