nipype / pydra

Pydra Dataflow Engine
https://nipype.github.io/pydra/
Other
120 stars 59 forks source link

Comparisons between Pydra, Nipype v1 and CWL #357

Open tclose opened 3 years ago

tclose commented 3 years ago

This discussion follows on from https://github.com/nipy/nipype/issues/3245#issuecomment-698512224.

Thanks for the explanation @djarecka. I get the benefits of caching now.

Looking at the generated input/output interfaces, am I right in saying that the main benefit you have over the CWL tool spec is the specification of available flags/parameters in the tool interface, and which are required for the other? I didn't quite follow where the specification fits with this, and how it will replace the _list_outputs functionality from in Nipype v1.

While we are on the topic of input specs, one of the things I find a bit unintuitive/verbose in Nipype v1 interfaces is having to define output files in the input spec as well as the output spec. I understand that this is because for most tools you need to supply the output filename in the command line args. However, I was thinking that you could just generate the command line str from a combination of the input and output specs. Would this work?

djarecka commented 3 years ago

So the specification is only used for the converter to create a specific input/output_spec, e.g. for bet. The specification together with pydra syntax is able to replace _list_outputs, or at least replace it in most cases. I'll try to have some better description and more examples next week, so will point you once it's ready. Hopefully it will be more clear.

In pydra you can use output names in input_spec only, if the field has output_file_template pydra will automatically add the field to the output_spec. Ypu can see an example here

dafrose commented 3 years ago

Mhh... I did not think about this earlier, but I guess this discussion aims at something similar as the feature request that I just opened: #367

I don't fully understand the entire scope of CWL, but would it be possible/helpful to do something close to CWL without the full requirements of CWL and still easy enough to read? I guess, this is what I suggested in the issue mentioned above.

djarecka commented 3 years ago

we want to be able to use CWL and boutiques with pydra, but implementing something that reads yml file with the pydra-specific fields, as you suggested in #367, should be much faster

tclose commented 3 years ago

Just to throw another comparison in the mix, have you looked closely at Airflow (something that came up in a presentation I just sat through)? I have only had a quick look but it seems fairly similar to the aims of Pydra, at least in as much as it has a Python API for creating DAG workflows. FWIW I like Pydra's syntax much better but I was wondering what the main points of difference are.

Back on the mapping between Pydra and CWL workflows, if am I right that in order to map a pydra workflow with a split/combine to CWL would I need to put the nodes in between the split/combine into a separate workflow and then "scatter" over it? If so, would that mean that it wouldn't be possible to map Pydra workflows with multiple splits before a merge onto the CWL workflow model in a general way?