Open biokcb opened 6 years ago
@biokcb, Currently, cwlexec
does not support to scatter a step on subflow level. This because cwlexec
will submit all of jobs in a flow to LSF at the beginning, this will make the jobs to be queued better, so this means cwlexec
will expend all of jobs in a flow. If there is a scattered subflow, the problem will be a bit complex, e.g. a subflow depends some other steps, we must wait to other steps are done then expend it, and, there is always a workaround can bypass the scattered subflow, so we finally decide to put this as a low priority, I think we will support this in future.
For your case, you can scatter the echocat.cwl
in subworkflow.cwl
SubworkflowArrayScatterWorkaround.zip
@skeeey Thanks for the update! I can definitely implement the workaround for now, but the example I gave was a more minimal one-step sub workflow that reproduced the error. For some of my workflows there are multiple steps that I'd like to be grouped into a sub workflow so that samples can proceed to each step independently. If I scatter per command line tool step, each step expects an array and must wait until all samples are processed in the previous step. If other samples don't need to wait on one particularly time-intensive sample, then our overall time spent processing samples can be reduced. I believe this will be a useful feature for us, so if you are able to support it in the future that would be great. Thanks!
@skeeey Can you explain this workaround a bit more? I don't quite see that this workaround helps our situation, but I want to understand what you mean by this first.
there is always a workaround can bypass the scattered subflow, so we finally decide to put this as a low priority, I think we will support this in future.
@drjrm3 The workaround is like @biokcb 's way, we can scatter every step for a subflow instead of scatter the whole subflow, indeed, it has the defect as @biokcb said. Currently, we focus to implement the ExpressTool
, I think after it is finished, we can solve this problem
Also need to test #33
Hi, @skeeey cwlexec is a very convenient CWL engine to dispatch jobs to IBM LSF. But that's too bad without scatter subworkflow. Is there any plan to support this? Thank you.
Hi,
We have a simple example workflow that seems to be passing array inputs without scattering them to lower level scripts
top_workflow.cwl
calls ->subworkflow.cwl
calls ->echocat.cwl
calls ->echocat.sh
which takes 3 inputs (string, file, file).subworkflow.cwl
just has a single step which takes a string input and a File[] input and passes it to the command line tool. This works fine with CWLEXEC. When I usetop_workflow.cwl
to scatter over an array of strings or an array of arrays of files, they do not get scattered, but instead passed directly to the command line tool, where it fails because the shell script cannot use it this way. The string array as a single string and the File array of arrays as a single array. Attached is the example and in theoutput.txt
file at line 646 the command is built incorrectly.SubworkflowArrayScatterError.tar.gz