Integrate R scripts - Githubissues

iiasa / message-ix-models

Tools for the MESSAGEix-GLOBIOM family of models

https://docs.messageix.org/models

Apache License 2.0

17 stars 33 forks source link

Integrate R scripts #40

Open khaeru opened 2 years ago

khaeru commented 2 years ago

message_data contains some modules that are partly or entirely in R. For instance:

The DLE code is entirely in R.
The Nexus (water) code is partly in Python, but with data processing scripts in R.

This issue is to discuss approaches so that this code (a) can be integrated into complete workflows that run unsupervised, and (b) can be more reusable, in a standard way. The implementation should be in message-ix-models.

Some ideas:

Provide a CLI command like mix-models r-script foo/bar/baz that will simply invoke a file at (e.g.) message_data/foo/bar/baz.R, while providing some standard environment variables that the script can use to understand paths to data, etc.
Use rpy2 (with documentation & demo code) in Python code to directly call functions from R code in particular files, and retrieve its output for further processing.

khaeru commented 2 years ago

@jkikstra @Jihoon @adrivinca @awais307 would appreciate some comment here on things like:

how you use R code as part of workflows (e.g. do you run an entire script? Interactively? Call it from somewhere?)
Which ones are used most heavily, or more likely to be of interest to others to (re)use.
Whether the above ideas would be helpful/usable in your workflows.

jkikstra commented 2 years ago

@khaeru, thanks for creating this issue to start discussions - from a first look i think at least option 2 could be really useful, option 1 maybe too but i don't know enough to give a good judgement there.

Detailed input from my side will have to wait a bit until after I return from holidays (probably will only get to it around 14 January).

In general, I was thinking until data becomes public, in message_data I would want to try to select the most useful Rscripts from DLE packages and integrate them in a DLE workflow that uses rpy2. For instance, I'm imagining that a "build" command could first run basic message, then use Rscripts to create a DLE scenario based on that, and then do a MESSAGE-DLE run after that.

More to follow in the new year.

adrivinca commented 2 years ago

In the nexus work, we have some R scripts that need to be run to process raw data into data then used by other python scripts. Since these R scripts need to be run just once -and sometimes link to large spatial data on the P drive- we do not include or call them from any python script. A user could just run them to generate new scenario configurations (SSP, SDGs), but otherwise all the output data of those scripts are already included in the message_data/data folder.