nf-core / tools

Python package with helper tools for the nf-core community.
https://nf-co.re
MIT License
232 stars 187 forks source link

Subworkflows can only use modules present in the same repo #1927

Open awgymer opened 1 year ago

awgymer commented 1 year ago

Description of feature

This is a current (possibly permanent) limitation of subworkflows. This means you cannot define a subworkflow in an nf-core structured repo which uses nf-core modules directly.

The implications to allow this would be greater complexity for updating and installing.

Should be clearly documented.

mberacochea commented 1 year ago

My team is currently adopting nf-core tools and we've noticed this limitation. I'm interested in working on adding support for 'hybrid' subworkflows. Any guidance on how to begin would be helpful.

awgymer commented 1 year ago

This is quite a thorny problem and right now there is no proper solution I am afraid. You could mirror the GitHub modules repo and add your own subworkflows and modules to that but that has its own wrinkles.

I hope we can find a better solution eventually but obviously as an open source project supporting split open-source, in-house work is probably not a priority issue.

mberacochea commented 12 months ago

I understand... I'll share our solution or workaround as soon as we find one that we are happy with. Thank you

GallVp commented 9 months ago

Hi @awgymer and @mberacochea

Thanks for the hints. Here is what I have settled on for now:

  1. Inside the organisation (XYZ) repo, create nf-core-modules directory. Do:
cd nf-core-modules
touch main.nf
touch nextflow.config

cat <<-EOF > .nf-core.yml
repository_type: pipeline
EOF
  1. nf-core-modules directory will behave as a pipeline and the nf-core modules can be installed with version control using nf-core tools.

  2. Inside the organisation (XYZ) repo, create a nf-core-hybridisation.sh to keep track of hybrid modules. Example:

#!/usr/bin/env bash

cp -r ./nf-core-modules/modules/nf-core/gunzip ./modules/nf-core/ # needed for hybrid testing

mkdir -p ./modules/XYZ/cat
cp -r ./nf-core-modules/modules/nf-core/cat/cat ./modules/XYZ/cat # Needed for a hybrid sub-workflow

This way the hybridisation can be version controlled. I am not sure it will work in every situation. Looking forward to your thoughts.

awgymer commented 9 months ago

If I understand this correctly you are basically using a "pipeline repo" to mirror modules into your remote and then syncing them with bash then?

This is a little like an idea that has been raised here which would see subworkflows package their modules alongside themselves.

I've only thought about it a little bit, but the idea in my head would be to create a 3rd "repository_type" of "subworkflow". This would mostly behave like a "pipeline" but with a few differences (some assumptions about pipeline repos wouldn't be quite the same).

The tooling could then be refactored to basically do a recursive pass of "subworkflows" updating/installing modules within (or perhaps they should be frozen I'm not sure).

GallVp commented 9 months ago

If I understand this correctly you are basically using a "pipeline repo" to mirror modules into your remote and then syncing them with bash then?

Yes, that's true. Essentially I am creating two copies in the same repo. Not ideal. But it is explicit and allows me to use nf-core tools to stay up to date with nf-core/modules. For me, it is really a temporary solution as I intend to eventually contribute all the local org modules and sub-workflows to nf-core/modules.

This is a little like an idea that has been raised here which would see subworkflows package their modules alongside themselves.

I've only thought about it a little bit, but the idea in my head would be to create a 3rd "repository_type" of "subworkflow". This would mostly behave like a "pipeline" but with a few differences (some assumptions about pipeline repos wouldn't be quite the same).

The tooling could then be refactored to basically do a recursive pass of "subworkflows" updating/installing modules within (or perhaps they should be frozen I'm not sure).

Yes, I like the idea of freezing modules inside sub-workflows. When a sub-workflow is downloaded by a pipeline developer, the nf-core tools can generate a warning saying that the sub-workflow modules are outdated. The developer can choose to keep using the outdated modules or create a sub-workflow update pull request which goes through the nf-test Github Actions along with the community review. Does this also prevent the sub-workflow malfunction due to breaking module updates? Or, is that already taken care of by some other mechanism?

drpatelh commented 7 months ago

We could also have the ability to provide multiple --git-remote options on the CLI and have some sort of fallback mechanism as to where the appropriate components are sourced? Don't know how the dependencies between modules and subworkflows are currently tracked in tools because this would need to be mirrored in modules.json somehow.

For example, --git-remote <MYGITHUB_REPO> --git-remote <NF_CORE_MODULES_REPO>. Tricky thing will be deciding which one takes precedence if you have the same modules in both of these repos, especially if you have more than 2 --git-remote.

Blasting some ideas out there. What do you think @mashehu @mirpedrol ?

ghost commented 7 months ago

Thank you @drpatelh .

To give a perspective of my case. I developed a subworfklow that uses internal (our nf-core_modules-like repo) and external (public nf-core/modules) modules. When I try to install this module with nf-core install --git-remote <internal nf-core modules URL> ..., nf-core tools can't find the modules.

What I would suggest is something like pip (https://pip.pypa.io/en/stable/cli/pip_install/#cmdoption-extra-index-url). I would add --extra-git-url or something like this, where the extra adds to what is not found in the --git-remote. This way, the --git-remote would have precedent to the --extra-git-url

This way, we can still use public modules and subworkflows and keep up-to-date with new releases with occasional local patches without the need to internalize modules without the intent to modify them heavily.