nextflow-io / nextflow

A DSL for data-driven computational pipelines
http://nextflow.io
Apache License 2.0
2.73k stars 626 forks source link

nextflow "run" and "pull" should do a shallow clone by default #3879

Closed ahstram closed 1 year ago

ahstram commented 1 year ago

Hi there,

One Nextflow feature I really appreciate is the ability to run a pipeline from our local Personalis Bitbucket server, using a command such as:

nextflow run nxf/hades -hub PSNL -v 1.3.7 ...

Which allows for much greater user friendliness, while also removing the need to deploy every possible version of our pipeline to a filesystem.

That being said, we generally do set the NXF_ASSETS variable on a pipeline-run basis (i.e., each sample we run gets its own clone of the nxf/hades repo). This results in thousands of copies of our pipelines being made on a weekly basis, many of those clones being of the same version. Having a pipeline-run specific clone does make it easier to have multiple pipeline versions running at once, and also makes for easier "manual hotfixing" if necessary (i.e., someone goes in and manually edits the clone to allow a failed job to succeed, etc).

Overall, the best compromise we've found is to use jgit's "setDepth()" functionality, which has been available since 6.3: https://javadoc.io/doc/org.eclipse.jgit/org.eclipse.jgit/latest/org.eclipse.jgit/org/eclipse/jgit/api/CloneCommand.html

We use setDepth(1), which means each of our clones has a complete checkout of the appropriate version of our pipeline, but no git history, which greatly reduces disk usage & load on our Bitbucket server.

I think it would be great to update Nextflow to JGit 6.3+ and use setDepth(1) by default, as I wouldn't expect that most users using nextflow run or nextflow pull will expect their full git history to be available.

I will submit a PR for your consideration.

Thank you,

Alexander Stram Associate Director, Bioinformatics Engineering Personalis, Inc

pditommaso commented 1 year ago

Using deep 1 as default creates a conflict when pulling a specific revision. I've modified this behaviour adding a -deep <n> command line option to specific the deep of the clone in an explicit manner. See https://github.com/nextflow-io/nextflow/commit/b44b64533dcbd5a65a9561f176e2e4ce56a2ed8a.