Nipype 2.0 Progress - Githubissues

dafrose commented 4 years ago

Dear nipype developers,

I am using nipype and I was wondering how 2.0 is progressing. I am addressing this here, because I could not find a better way, so I hope that is fine.

According to the nipype 2.0 milstone, it is overdue for about a year. Updates to related issues paint a similar picture. Has development on 2.0 stopped?

A bit more constructive: How or where could an interested person like me help the most? Do you have a way to collect money to buy development time? What is already being worked on and where would you need someone to actually grab something and do it?

Best, Daniel

satra commented 4 years ago

@dafrose - thank you for raising this issue. Nipype 2.0 work has moved over to it's own organization: github.com/nipype

we should update these milestones and at least put a pointer to things.

here is where it stands:

the new workflow engine (github.com/nipype/pydra) has most of the basic functionality we need.
the latest pydra release will have support for starting to move over existing nipype interfaces.
these interfaces will now be created as separate packages and maintained separately. i expect this process to run through the end of the year, unless a lot of folks with python + neuroimaging knowledge pitch in.

we hope to have some demonstration neuroimaging workflows in place by end of september.

@djarecka is coordinating the pydra development and easiest would be to start looking at milestones and issues there. i believe communication can happen on the mattermost channel

https://mattermost.brainhack.org/brainhack/channels/nipype

dafrose commented 4 years ago

Thank you @satra for your reply. This is indeed very good to know. I did not realize that nipype development had moved to a new organization. I will have a look!

One more question, though: Will pydra replace nipype (2.0) or will nipype 2.0 be a frontend package that runs pydra as backend?

satra commented 4 years ago

@dafrose - pydra itself would be the workflow engine in nipype 2.0. however, the way the packages are constructed right now, we are also inserting the package into the pydra.tasks namespce. so the following would work as import statements.

from pydra.tasks.fsl import BET
from pydra.engine import Workflow

conceptually we are calling nipype 2.0 an ecosystem comprising the following projects

pydra: workflow engine
pydra.tasks: interfaces from packages
testkraken: a vibration testing framework
neurodocker: containerization of neuroimaging tools
niflows: repository of functioning workflows (whether or not they use pydra).

and siblings:

nipreps
niworkflows

tclose commented 4 years ago

Hi @satra,

Nice to see the progress towards 2.0 👍

I remember you raised CWL in the discussion around potential directions for Nipype 2 at the BrainHack in Singapore. What are the main limitations of CWL w.r.t. neuroimaging workflows that made it an unattractive solution? Is its workflow branching/merging structures too inflexible for typical neuroimaging workflows.

Cheers,

Tom

satra commented 4 years ago

@tclose - great question.

CWL is a specification and has two components: a workflow spec and a tool spec.

the CWL tool spec is really hard to use for many neuroimaging tools because many of the tools are like a leatherman tool - they do many things. in nipype we addressed this using python code, while for CWL you will need to write javascript code. hence it didn't make a whole lot of sense to adopt this. btw, this same limitation exists for boutiques as well. both of these are really good at capturing a specific use of tool, much less as a generic wrapper for an arbitrary command line tool. by using python both nipype/pydra can generate wrappers for more complex uses. for a specific example of such complexity see: https://github.com/nipy/nipype/blob/master/nipype/interfaces/fsl/preprocess.py#L893

as a workflow spec CWL is easier to map to, but the CWL workflow spec requires the corresponding CWL tool spec. so if you were creating a tool spec just for the specific instantiation of the tool you are using then CWL will be able to cover both the workflow and the tool. with nipype we wanted a general tool representation that can be re-instantiated for different use cases, since much of the neuroimaging world has created such tools.

conclusion: CWL can definitely be used for many neuroimaging workflows, but creating a common tool library is going to be harder.

in pydra, we have evolved the status of a workflow as being synonymous with a task. so it can be cached just like an interface. this is true while still supporting nested workflows and supporting parameter sweeps over both workflows and tasks (interfaces). as far as i know this flexibility is harder to implement in CWL. leveraging python over any abstract DSL provides us much greater flexibility to construct the complex workflows that exist in neuroimaging.

hope that helps.

djarecka commented 4 years ago

Hi @dafrose, thank you for reaching out! We would love to have your feedback and help. Perhaps the best place to learn about pydra functionality is pydra- tutorial (doesn't cover everything, but hope to update soon).

You can also check the issues and milestones. Feel free to ask any question if you find something that you would like to work on!

dafrose commented 4 years ago

Thanks for the pointer @djarecka.

One More question to you and @satra: How production-ready is pydra at this point? I am in the progress of implementing a lengthy many-subject pipeline using nipype and was wondering whether it made sense to try out pydra instead of nipype 1.x, while I am doing this. What would you suggest?

satra commented 4 years ago

@dafrose - we are using pydra for some projects in our group (https://github.com/nipype/pydra-ml) and for certain workflows in house. on our front, we are ramping up pydra use significantly in our own projects. however, as with any new software, there are bugs that we need to address when they come up.

the short answer is pydra is stable for certain applications and the api is quite stable at this point. there is one sticking api issue (https://github.com/nipype/pydra/issues/295) which we hope to take a decision on soon, but any comments would be welcome.

regarding nipype v. pydra workflows there are two issues:

the conversion of nipype interfaces. this is being done in two ways. there is a package (https://github.com/nipype/pydra-nipype1) that allows using nipype 1.x interfaces with pydra workflows. this may be a good starting point since the 1.x interfaces are mature. we are in the process of converting other interfaces directly to pydra tasks. this will take time and it's speed will be directly proportional to community contributions
while pydra is more powerful as a workflow engine, nipype has support for broader HPC use-cases. we want this in pydra, but our present use cases have been related to our own work (local multiprocessing with concurrent futures, SLURM, and Dask), and with dask support, we think we are covering a broad range of scenarios. debugging is a lot easier in nipype for the average user, but pydra is improving steadily on this front.

if you jump into pydra, we will be there to support and fix issues as needed. so you will get support while helping us transition the ecosystem. however, nipype is the stable tool on the block. so it's going to be a function of your needs and timelines. happy to help either way.

tclose commented 4 years ago

@satra, thanks for the detailed explanation.

For context, I have been considering whether it would make sense to adapt Arcana so it can spit out CWL descriptions, which could then be run in a CWL engine (along with bioinformatics pipelines) integrated into the analysis platform we are building.

CWL is a specification and has two components: a workflow spec and a tool spec.

the CWL tool spec is really hard to use for many neuroimaging tools because many of the tools are like a leatherman tool - they do many things. in nipype we addressed this using python code, while for CWL you will need to write javascript code. hence it didn't make a whole lot of sense to adopt this. btw, this same limitation exists for boutiques as well. both of these are really good at capturing a specific use of tool, much less as a generic wrapper for an arbitrary command line tool. by using python both nipype/pydra can generate wrappers for more complex uses. for a specific example of such complexity see: https://github.com/nipy/nipype/blob/master/nipype/interfaces/fsl/preprocess.py#L893

I see. If you were so inclined, could you write the CWL tool spec to wrap the Nipype interface instead of the tool itself, or would you still run into the same limitations do you think?

as a workflow spec CWL is easier to map to, but the CWL workflow spec requires the corresponding CWL tool spec. so if you were creating a tool spec just for the specific instantiation of the tool you are using then CWL will be able to cover both the workflow and the tool. with nipype we wanted a general tool representation that can be re-instantiated for different use cases, since much of the neuroimaging world has created such tools.

conclusion: CWL can definitely be used for many neuroimaging workflows, but creating a common tool library is going to be harder.

Ok, I think I see the issue now. You would need multiple CWL tool specs for each slightly different use of the tool, do I have that right?

in pydra, we have evolved the status of a workflow as being synonymous with a task. so it can be cached just like an interface. this is true while still supporting nested workflows and supporting parameter sweeps over both workflows and tasks (interfaces). as far as i know this flexibility is harder to implement in CWL. leveraging python over any abstract DSL provides us much greater flexibility to construct the complex workflows that exist in neuroimaging.

I'm not quite sure I follow. What is the significance of caching a workflow like an interface?

Does this solve the problem of iterating over a variable number sub-nodes, the number of which are only known at runtime (i.e. determined by an upstream node of the pipeline)?

hope that helps.

Much appreciated :)

tclose commented 4 years ago

@satra I have had the chance to look into Pydra a bit more, and I really like what you guys have done in streamlining the API.

Having the ability to quickly generate an interface for a new tool should save a lot of time but I didn't quite get where the wrappers for more complex uses are implemented in the new structure (i.e. what was often put in '_list_outputs').

(would it be better to ask such questions on a different forum?)

djarecka commented 4 years ago

Hi @tclose, I'm currently still working on the conversion tools from nipype to pydra. Right now I'm focusing on FSL - the idea is to automatically generate interfaces like these ones. This is still wip and it's on my branch, but I hope to have something more generic to show soon. But many things that are implemented in nipype's _list_outputs have to be rewritten in the specification

We are also planning to allow CWL specification with pydra, but this is not implemented yet.

Regarding your questions about Workflow - we want to support Workflow with everything the Task has, so also with caching. For caching it means, that I don't have to check each single task form the graph next time I see the specific workflow (not sure if this answer your question, please let me know).

It's fine to continue the discussion here, pydra repository is a good place to ask questions as well :)

tclose commented 4 years ago

I started a new discussion on the Pydra issue tracker above for anyone interested

nipy / nipype

Nipype 2.0 Progress #3245