Open dafrose opened 4 years ago
@dafrose - thank you for raising this issue. Nipype 2.0 work has moved over to it's own organization: github.com/nipype
we should update these milestones and at least put a pointer to things.
here is where it stands:
we hope to have some demonstration neuroimaging workflows in place by end of september.
@djarecka is coordinating the pydra development and easiest would be to start looking at milestones and issues there. i believe communication can happen on the mattermost channel
Thank you @satra for your reply. This is indeed very good to know. I did not realize that nipype development had moved to a new organization. I will have a look!
One more question, though: Will pydra replace nipype (2.0) or will nipype 2.0 be a frontend package that runs pydra as backend?
@dafrose - pydra itself would be the workflow engine in nipype 2.0. however, the way the packages are constructed right now, we are also inserting the package into the pydra.tasks
namespce. so the following would work as import statements.
from pydra.tasks.fsl import BET
from pydra.engine import Workflow
conceptually we are calling nipype 2.0 an ecosystem comprising the following projects
and siblings:
Hi @satra,
Nice to see the progress towards 2.0 👍
I remember you raised CWL in the discussion around potential directions for Nipype 2 at the BrainHack in Singapore. What are the main limitations of CWL w.r.t. neuroimaging workflows that made it an unattractive solution? Is its workflow branching/merging structures too inflexible for typical neuroimaging workflows.
Cheers,
Tom
@tclose - great question.
CWL is a specification and has two components: a workflow spec and a tool spec.
the CWL tool spec is really hard to use for many neuroimaging tools because many of the tools are like a leatherman tool - they do many things. in nipype we addressed this using python code, while for CWL you will need to write javascript code. hence it didn't make a whole lot of sense to adopt this. btw, this same limitation exists for boutiques as well. both of these are really good at capturing a specific use of tool, much less as a generic wrapper for an arbitrary command line tool. by using python both nipype/pydra can generate wrappers for more complex uses. for a specific example of such complexity see: https://github.com/nipy/nipype/blob/master/nipype/interfaces/fsl/preprocess.py#L893
as a workflow spec CWL is easier to map to, but the CWL workflow spec requires the corresponding CWL tool spec. so if you were creating a tool spec just for the specific instantiation of the tool you are using then CWL will be able to cover both the workflow and the tool. with nipype we wanted a general tool representation that can be re-instantiated for different use cases, since much of the neuroimaging world has created such tools.
conclusion: CWL can definitely be used for many neuroimaging workflows, but creating a common tool library is going to be harder.
in pydra, we have evolved the status of a workflow as being synonymous with a task. so it can be cached just like an interface. this is true while still supporting nested workflows and supporting parameter sweeps over both workflows and tasks (interfaces). as far as i know this flexibility is harder to implement in CWL. leveraging python over any abstract DSL provides us much greater flexibility to construct the complex workflows that exist in neuroimaging.
hope that helps.
Hi @dafrose, thank you for reaching out! We would love to have your feedback and help. Perhaps the best place to learn about pydra functionality is pydra- tutorial (doesn't cover everything, but hope to update soon).
You can also check the issues and milestones. Feel free to ask any question if you find something that you would like to work on!
Thanks for the pointer @djarecka.
One More question to you and @satra: How production-ready is pydra at this point? I am in the progress of implementing a lengthy many-subject pipeline using nipype and was wondering whether it made sense to try out pydra instead of nipype 1.x, while I am doing this. What would you suggest?
@dafrose - we are using pydra for some projects in our group (https://github.com/nipype/pydra-ml) and for certain workflows in house. on our front, we are ramping up pydra use significantly in our own projects. however, as with any new software, there are bugs that we need to address when they come up.
the short answer is pydra is stable for certain applications and the api is quite stable at this point. there is one sticking api issue (https://github.com/nipype/pydra/issues/295) which we hope to take a decision on soon, but any comments would be welcome.
regarding nipype v. pydra workflows there are two issues:
the conversion of nipype interfaces. this is being done in two ways. there is a package (https://github.com/nipype/pydra-nipype1) that allows using nipype 1.x interfaces with pydra workflows. this may be a good starting point since the 1.x interfaces are mature. we are in the process of converting other interfaces directly to pydra tasks. this will take time and it's speed will be directly proportional to community contributions
while pydra is more powerful as a workflow engine, nipype has support for broader HPC use-cases. we want this in pydra, but our present use cases have been related to our own work (local multiprocessing with concurrent futures, SLURM, and Dask), and with dask support, we think we are covering a broad range of scenarios. debugging is a lot easier in nipype for the average user, but pydra is improving steadily on this front.
if you jump into pydra, we will be there to support and fix issues as needed. so you will get support while helping us transition the ecosystem. however, nipype is the stable tool on the block. so it's going to be a function of your needs and timelines. happy to help either way.
@satra, thanks for the detailed explanation.
For context, I have been considering whether it would make sense to adapt Arcana so it can spit out CWL descriptions, which could then be run in a CWL engine (along with bioinformatics pipelines) integrated into the analysis platform we are building.
CWL is a specification and has two components: a workflow spec and a tool spec.
the CWL tool spec is really hard to use for many neuroimaging tools because many of the tools are like a leatherman tool - they do many things. in nipype we addressed this using python code, while for CWL you will need to write javascript code. hence it didn't make a whole lot of sense to adopt this. btw, this same limitation exists for boutiques as well. both of these are really good at capturing a specific use of tool, much less as a generic wrapper for an arbitrary command line tool. by using python both nipype/pydra can generate wrappers for more complex uses. for a specific example of such complexity see: https://github.com/nipy/nipype/blob/master/nipype/interfaces/fsl/preprocess.py#L893
I see. If you were so inclined, could you write the CWL tool spec to wrap the Nipype interface instead of the tool itself, or would you still run into the same limitations do you think?
as a workflow spec CWL is easier to map to, but the CWL workflow spec requires the corresponding CWL tool spec. so if you were creating a tool spec just for the specific instantiation of the tool you are using then CWL will be able to cover both the workflow and the tool. with nipype we wanted a general tool representation that can be re-instantiated for different use cases, since much of the neuroimaging world has created such tools.
conclusion: CWL can definitely be used for many neuroimaging workflows, but creating a common tool library is going to be harder.
Ok, I think I see the issue now. You would need multiple CWL tool specs for each slightly different use of the tool, do I have that right?
in pydra, we have evolved the status of a workflow as being synonymous with a task. so it can be cached just like an interface. this is true while still supporting nested workflows and supporting parameter sweeps over both workflows and tasks (interfaces). as far as i know this flexibility is harder to implement in CWL. leveraging python over any abstract DSL provides us much greater flexibility to construct the complex workflows that exist in neuroimaging.
I'm not quite sure I follow. What is the significance of caching a workflow like an interface?
Does this solve the problem of iterating over a variable number sub-nodes, the number of which are only known at runtime (i.e. determined by an upstream node of the pipeline)?
hope that helps.
Much appreciated :)
@satra I have had the chance to look into Pydra a bit more, and I really like what you guys have done in streamlining the API.
Having the ability to quickly generate an interface for a new tool should save a lot of time but I didn't quite get where the wrappers for more complex uses are implemented in the new structure (i.e. what was often put in '_list_outputs').
(would it be better to ask such questions on a different forum?)
Hi @tclose, I'm currently still working on the conversion tools from nipype to pydra. Right now I'm focusing on FSL - the idea is to automatically generate interfaces like these ones.
This is still wip and it's on my branch, but I hope to have something more generic to show soon. But many things that are implemented in nipype's _list_outputs
have to be rewritten in the specification
We are also planning to allow CWL specification with pydra, but this is not implemented yet.
Regarding your questions about Workflow - we want to support Workflow with everything the Task has, so also with caching. For caching it means, that I don't have to check each single task form the graph next time I see the specific workflow (not sure if this answer your question, please let me know).
It's fine to continue the discussion here, pydra repository is a good place to ask questions as well :)
I started a new discussion on the Pydra issue tracker above for anyone interested
Dear nipype developers,
I am using nipype and I was wondering how 2.0 is progressing. I am addressing this here, because I could not find a better way, so I hope that is fine.
According to the nipype 2.0 milstone, it is overdue for about a year. Updates to related issues paint a similar picture. Has development on 2.0 stopped?
A bit more constructive: How or where could an interested person like me help the most? Do you have a way to collect money to buy development time? What is already being worked on and where would you need someone to actually grab something and do it?
Best, Daniel