iiasa / message_ix

The integrated assessment and energy systems model MESSAGEix
https://docs.messageix.org
Apache License 2.0
116 stars 152 forks source link

Improve reporting features #149

Closed gidden closed 5 years ago

gidden commented 5 years ago

It should be possible to “report” or “post-process” a message_ix.Scenario (given a sufficient amount of configuration metadata) to generate output (IAMC-compliant pd.DataFrame or file) that can be directly submitted to IIASA databases for either single- or multi-model assessments.

Tasks

OFR-IIASA commented 5 years ago

The first document I would like to add was originally intended as part of the message_ix documentation. This document explains the mathematical operations carried out for currently reported variables. https://iiasahub.sharepoint.com/:u:/s/ene/MESSAGEix/Ea0QmEl4TB5BqxeIyzERqFIBkq54eBAeFDwq92xR9nTe7A?e=eRCodt

OFR-IIASA commented 5 years ago

The second document, which may not reflect the most current version of the reporting, shows for every variable calculated in the model the exact calculation process and therefore provides a detailed overview of required operations. https://iiasahub.sharepoint.com/:x:/s/ene/MESSAGEix/ERxvb_nkeTZGkD6iXpgznfoBTBxiqui1L50Gs4WqqyjFxQ?e=Maf3HA

OFR-IIASA commented 5 years ago

There are several important features required for the current reporting. Please feel free to add features.

khaeru commented 5 years ago

[Note for future readers that there is a separate, non-public Google Doc containing requirements discussion.]

khaeru commented 5 years ago

I left a comment on #150, but this comment also responds to the discussion in #151. Hopefully this is the right place for it :man_shrugging:

Other software efforts (dask (detailed example), TensorFlow, many others) use the pattern of a graph in which:

150 and #151 invert these, so that nodes are data and edges are (sort of) tasks. I don't see that it's necessary to invent a new pattern, and in the process cut ourselves off from libraries that would simplify the codebase/slow the accumulation of technical debt. Everything discussed so far can be expressed in the common pattern as tasks:

In both the dask and tf semantics, even the basic action of yielding a fixed value (of 0 or more dimensions) is a task/operation/node. In the present discussion, that covers:

Non-mathematical manipulations of data are also tasks, e.g.:

Using the common pattern, almost all of the requirements can be met by defining an exhaustive collection of tasks, and then by composing and manipulating graphs. We would provide both low- and high-level shorthands for such manipulation, e.g.:

Note in particular the print_and_return task in the dask example linked above. We would define reports as tasks that each take specific other data as input, then format them to an expected return value (e.g. pyam.IamDataFrame or something else). By requesting the report, the computation of data that it depends on it triggered. The user can then write the return value to file formats of choice.

khaeru commented 5 years ago

I updated the description here to match the results of today's (2019-03-01) discussion. Further details are on the MESSAGEix OneNote folder for this date.

khaeru commented 5 years ago

I've set the milestone for this issue to 1.2.0. Once #142 is merged, the milestone for this issue can be switched to 1.3.0 or 2.0 (whichever we'll target for the dask-based reporting).

khaeru commented 5 years ago

Closing this as resolved by the experimental reporting modules in the ixmp 0.2 (just now including iiasa/ixmp#150) and message_ix 1.2 (later today, including #206) releases.

We can use separate, smaller issues to iterate on these features as needed.