vatlab / sos

SoS workflow system for daily data analysis
http://vatlab.github.io/sos-docs
BSD 3-Clause "New" or "Revised" License
274 stars 45 forks source link

A summary of files generated by a step #1508

Closed gaow closed 1 year ago

gaow commented 1 year ago

We found ourselves often using pipelines steps such as shown below to keep a record of output files generated from a pipeline step:

image

where step 2 contains concurrent substeps each generating a file, and step 3 will simply save these file names to a list. I wonder if there is a built-in feature for SoS to save such a list so we dont have to always use a separate step 3 every time?

BoPeng commented 1 year ago

I do not think there is a way for that since the task statement in step 2 is executed in parallel, and as tasks, so there is no clear step_output before every substep is done.

The only possible scenario to add a "summary action" for a step is when the substeps are executed sequentially and not in a step, using something like

input: ... concurrent=False

python: 
    # do something for each substep

python: active=-1
   # only active at the last step after everything is done.
BoPeng commented 1 year ago

But maybe you can use auxiliary step that depends on the output of these steps? Something like

[step 2]
output2

[step 3]
output 3

[summarize: output pattern]
output: summary

[default]
input: require output2, output3, summary of output2, summary of output3

In this way the summary step will be executed after step2 and step3. The exact syntax will vary but this in theory should work.

gaow commented 1 year ago

I see. That should make the syntax a bit more clearer and somewhat generic. Let me try the idea of auxiliary step. Thank you!