Closed tiborsimko closed 5 years ago
just to connect threads: there is also make_workflow.py
we use in the recast-workflow repo https://github.com/recast-hep/recast-workflow/tree/master that @AlexSchuy is working on to generate workflows from a bunch of combinatorial options
Following sucessful tests in #8 #9 #10, we know that REANA is able to run CMS reconstruction for a variety of RAW samples (e.g. dataset SingleMu) and data-taking years (e.g. 2011).
(1) Design a first simple "workflow factory" script that will produce REANA workflow for given parameters. Example:
The command should generate workflow in a given output directory that would be ready to run REANA, with any necessary input file information and configuration files and Python code snippets and whatnot.
For example, people could then do:
(2) The necessary CMSSW released version and the configuration files will be in the future fully read from CERN Open Data records using
cernopendata-client
. Until the client is fully ready, the first implementation could have the snippets committed here and/or read from CMS's RAWtoAODValidation repository.(3) The implementation should be extensible so that we could add easily additional arguments in the future, for example:
--files
to specify whether we run on random file, smallest file, or all files (this would require parallelism)--workflow-engine
to specify whether we generate CWL, Serial, Yadage, or perahps Argo, DAGMAN etc--compute-backend
to specify whether we run on Kubernetes (default) or HTCondorCERN (e.g. when processing many files in parallel)Note that (2) or (3) aren't to be implemented as part of this ticket, it is sufficient to think about this in order to choose underlying technology (e.g. Jinja templating, cookiecutter templating, or simply generate everything fully from Python via string templates).
See also musings in https://github.com/reanahub/reana/issues/189