This task is the same as #359 but for Yadage instead of Serial workflows.
Here are some Yadage-specific musings.
(1) Note that Yadage can launch many stages in parallel. We should parse Yadage definitions, look for stages that depend only on "init", such as:
stages:
- name: gendata
dependencies: [init]
these will run intially.
We should not consider later workflow stages that depend on previous ones, such as:
- name: fitdata
dependencies: [gendata]
as these will come into the play only later.
(2) Note that we should take care of all subworkflows, see the BSM example where workflow/databkgmc.yml can launch many stages in parallel.
(3) For each stage that can run, we should extract the number of jobs and the memory they require. However, one step may launch multiple jobs, for example:
The task should therefore statically analyse workflow, extract stages that are dependent only on init, and look for "scatter" paradigm, and look for the number of input array items, and the result will be (9, 8 GiB) in this case, where skimming will launch 9 parallel jobs initially.
This task is the same as #359 but for Yadage instead of Serial workflows.
Here are some Yadage-specific musings.
(1) Note that Yadage can launch many stages in parallel. We should parse Yadage definitions, look for stages that depend only on "init", such as:
these will run intially.
We should not consider later workflow stages that depend on previous ones, such as:
as these will come into the play only later.
(2) Note that we should take care of all subworkflows, see the BSM example where
workflow/databkgmc.yml
can launch many stages in parallel.(3) For each stage that can run, we should extract the number of jobs and the memory they require. However, one step may launch multiple jobs, for example:
Here, the skimming stage will lead to running say 9 jobs, if the input looks like:
The task should therefore statically analyse workflow, extract stages that are dependent only on init, and look for "scatter" paradigm, and look for the number of input array items, and the result will be (9, 8 GiB) in this case, where skimming will launch 9 parallel jobs initially.