Closed alintulu closed 4 years ago
Both CWL and Yadage can have inputs specified as separate files. Example for CWL:
$ cat reana.yaml
inputs:
parameters:
input: workflow/input.yml
workflow:
type: cwl
file: workflow/workflow.cwl
$ cat workflow/input.yml
library:
class: File
path: src/PhysicsObjectsHistos.cc
build_file:
class: File
path: BuildFile.xml
validation_script:
class: File
path: demoanalyzer_cfg.py
So you could use this technique, create a big input.yml
that would list all the cross section values or all the dataset ROOT files etc, and this should work.
For Yadage it is also possible to do something like yadage-run workflow.yaml inputs.yaml
, but I'm not sure we have any concrete example tested on REANA yet. So the "passing of input files" may need to be added to r-w-e-yadage
, perhaps.
For CWL, we do have many examples, so this should work out of the box already.
Can you try to create a vanilla cwltool
or yadage-run
example using such input file, and once you have an example ready, we can see how to best convert it to `reana.yaml?
P.S. See e.g. reana-demo-worldpopulation
CWL example that has 4-5 parameters.
Simple example of Yadage containing
can be found here. Workflow runs with
yadage-run workdir workflow.yaml input.yaml
where the input is read from input.yaml
. Next step figuring out how to best implement the passing of input parameters from input.yaml
when file declared in reana.yaml
. As mentioned this already works for CWL :)
In yadage it seems like initdata is a json with key-value pairs of 'parameter name'-'parameter value'. It is set in two ways, by
initfiles
- yaml files, either passed in command line or by default if namned input.yml
parameters
- params passed in the command line like -p pname=pvalue
In REANA initdata
is set to workflow_parameters
at reana_workflow_engine_yadage/clip.y and reana_workflow_controller/workflow_run_manager.py which in turn is set to parameters
at reana_deb/models.py.
parameters
are read from reana.yaml
from the inputs: parameter: field.
inputs:
parameters:
i.e. currently initdata
passed to yadage can only be set by defining the parameters in reana.yaml
.
It also seems like initfiles
cannot be directly passed to yadage since only initdata
is specified in steering_ctx.
Hence we can not just create an input.yml
file and hand it to yadage as initfiles
, but instead we have to create a method in REANA that sets initdata
by
input.yml
file create json with key-values as specified in the yaml fileparameters
which in turn sets initdata
Regarding user interface, we should introduce a new option initfiles
that people can use in their reana.yaml
, similarly to the recently-added options initdir
and toplevel
. In this way the analysis will have explicitly documented its input files and/or parameters.
Regarding implementation, the r-w-e-yadage would have to do something like the following to merge the input file parameters and command-line parameters:
from yadage.utils import getinit_data
initdata = getinit_data(initfiles, parameter)
in order to pass the resulting merged initdata to the yadage steering. (See Yadage sources.)
Both CWL and Yadage provide a “scatter-gather” paradigm. The workflow takes the input as an array and runs the specified steps on each element of the array as if it were a single input (Yadage allows for wanted batch size if specified).
The array can be declared in reana.yaml under
inputs: parameters:
like in the example from the Awesome Workshop.Currently the parameter array has to be declared explicitly by writing each element of the array down as a new line in the reana.yaml.
This is okey when you have 2-10 entries, however not realistic to enter 1500 entries as may be the case (example; names of data set files).
To be added:
Allow to specify a file to read the entries from. Each line in the file would be taken as an entry to the array.
Instead of adding 1500 lines to the reana.yaml those lines could be read from
index.txt
. The parameter arraycross_sections
would then be provided to CWL or Yadage which would use it as an input for their “scatter-gather” paradigm.