Currently, __reana.workflow table stores REANA workflow specification in the reana_specification field with value corresponding to JSON representation of the workflow, for example:
The accompanying columns are input_parameters and operational_options, for example {} and {"accept_metadir": true}.
All workflow types (CWL, Serial, Snakemake, Yadage) are stored in this way.
This storage technique has several inconveniences:
We cannot easily modify and rerun workflows between steps. This was addressed by requiring workflow/steps.yaml and friends to be present in reana.yaml's inputs.directories in all the examples, so that workspace has the exact copy of original files.
We cannot easily refer to original reana.yaml and the single-source-of-truth". If we have access only to the JSON representation, and if we have had a bug that was creating this JSON representation or its parameters, then we cannot detect back what exactly user submitted.
Ditto, if one day we would like to change the specification storage format, we would have to be able to convert back to the original reana.yaml representation, which may be lost.
For Serial workflows, users cannot easily know which reana.yaml caused which run, unless they add it explicitly as well. (For CWL, Snakemake, Yadage) we require this; see abev.
In connection to the Run-on-REANA sprint, we may want to rethink this philosophy. The SSOT is the reana.yaml that we fetch from external sources, and it may be interesting to (i) both preserve it as such; (ii) populate workspace automatically with it.
The goal of this ticket is to investigate:
whether we can easily modify reana-client to always upload workflow specification as SSOT to the workspace, without users having to add them by hand;
whether we can change the DB models and store workflow specification in the original format and/or whether we can always revert back from the stored JSON to the submitted reana.yaml and friends.
If the format is not nicely reversible, let's store the originals by default instead of, or in parallel to, the JSON representation.
If the format is nicely reversible, let's provide convenience functions to recreate reana.yaml out of stored reana_specification and friends.
Note that changing the storage policy for REANA specification would also require to retest the handling of reana-client run -p myparam=myvalue etc. It may therefore be interesting to store both as a simpler solution.
Note that if we decide for changing internal storage, an Alembic recipe will have to be written so that users could consult both their old and new workflows.
A special care should then be given to "compatibility" considerations for any incompatible change. Having a new optional column in the DB, or new mandatory files created in the workspace, could be a good simple solution without breaking compatibility.
Currently,
__reana.workflow
table stores REANA workflow specification in thereana_specification
field with value corresponding to JSON representation of the workflow, for example:The accompanying columns are
input_parameters
andoperational_options
, for example{}
and{"accept_metadir": true}
.All workflow types (CWL, Serial, Snakemake, Yadage) are stored in this way.
This storage technique has several inconveniences:
We cannot easily modify and rerun workflows between steps. This was addressed by requiring
workflow/steps.yaml
and friends to be present inreana.yaml
'sinputs.directories
in all the examples, so that workspace has the exact copy of original files.We cannot easily refer to original
reana.yaml
and the single-source-of-truth". If we have access only to the JSON representation, and if we have had a bug that was creating this JSON representation or its parameters, then we cannot detect back what exactly user submitted.Ditto, if one day we would like to change the specification storage format, we would have to be able to convert back to the original
reana.yaml
representation, which may be lost.For Serial workflows, users cannot easily know which
reana.yaml
caused which run, unless they add it explicitly as well. (For CWL, Snakemake, Yadage) we require this; see abev.In connection to the Run-on-REANA sprint, we may want to rethink this philosophy. The SSOT is the
reana.yaml
that we fetch from external sources, and it may be interesting to (i) both preserve it as such; (ii) populate workspace automatically with it.The goal of this ticket is to investigate:
whether we can easily modify
reana-client
to always upload workflow specification as SSOT to the workspace, without users having to add them by hand;whether we can change the DB models and store workflow specification in the original format and/or whether we can always revert back from the stored JSON to the submitted
reana.yaml
and friends.reana.yaml
out of storedreana_specification
and friends.Note that changing the storage policy for REANA specification would also require to retest the handling of
reana-client run -p myparam=myvalue
etc. It may therefore be interesting to store both as a simpler solution.Note that if we decide for changing internal storage, an Alembic recipe will have to be written so that users could consult both their old and new workflows.
A special care should then be given to "compatibility" considerations for any incompatible change. Having a new optional column in the DB, or new mandatory files created in the workspace, could be a good simple solution without breaking compatibility.