reanahub / reana-db

REANA database utilities
http://reana-db.readthedocs.org
MIT License
0 stars 31 forks source link

models: rethink `reana_specification` storage philosophy #162

Open tiborsimko opened 2 years ago

tiborsimko commented 2 years ago

Currently, __reana.workflow table stores REANA workflow specification in the reana_specification field with value corresponding to JSON representation of the workflow, for example:

 {"inputs": {"directories": ["workflow/yadage"], "files": ["code/worldpopulation.ipynb", "data/World_historical_and_predicted_populations_in_percentage.csv"], ...}

The accompanying columns are input_parameters and operational_options, for example {} and {"accept_metadir": true}.

All workflow types (CWL, Serial, Snakemake, Yadage) are stored in this way.

This storage technique has several inconveniences:

In connection to the Run-on-REANA sprint, we may want to rethink this philosophy. The SSOT is the reana.yaml that we fetch from external sources, and it may be interesting to (i) both preserve it as such; (ii) populate workspace automatically with it.

The goal of this ticket is to investigate:

Note that changing the storage policy for REANA specification would also require to retest the handling of reana-client run -p myparam=myvalue etc. It may therefore be interesting to store both as a simpler solution.

Note that if we decide for changing internal storage, an Alembic recipe will have to be written so that users could consult both their old and new workflows.

A special care should then be given to "compatibility" considerations for any incompatible change. Having a new optional column in the DB, or new mandatory files created in the workspace, could be a good simple solution without breaking compatibility.

tiborsimko commented 2 years ago

Note: If we decide for a change, we need to update r-client and r-server accordingly. See also https://github.com/reanahub/reana-server/issues/440