Open pditommaso opened 2 years ago
@pditommaso one question:
do we need to check if the repo is present in the asset
directory with the "old" format (no bare) and in this case no use the bare feature? i.e. some kind of retro compatibility or we'll force to remove and recreate local repos
Good point. If already exists think should report a warning message maybe?
so, report with a warning message (maybe with some instructions to remove the current repo) and stop the command, right?
No I mean, show a warning message i.e. log.warn
and do not stop. Usually nextflow only stops on error.
ah ok, so if I understand correctly we try to identify which kind of repo we are working on at startup
I see your point. In principle the bare should have been created when the feature has been enabled with a config flag or env variable, right?
If so, I think when this option does not match the repo format a warning should be reported
@jorgeaguileraseqera any ETA for this?
Hope to have in these days
(it's a little tedious due to the API rate limit breaks sometimes to run all tests )
Can you please open at least a draft PR asap?
due to the API rate limit breaks sometimes to run all tests
Do you mean Github rate limits? Are you using your GITHUB_TOKEN for tests?
yes, I've created one and configured the env to run the tests
Weird, but for such tests it should not depends on GitHub. It can created a small test repos and then use it for testing.
There's something similar for testing Git submodules
Implementing the functionality in this issue would also solve issue #2655 .
Maybe also, clarify in the issue title that "concurrent run" is only for runs from different working directories (with different work/ and .nextflow subdirs).
Maybe also, clarify in the issue title that "concurrent run" is only for runs from different working directories (with different work/ and .nextflow subdirs).
Note that we're talking about the NXF_HOME
folder (~/.nextflow
), not the hidden .nextflow
folder in the launch directory here.
We lost the momentum with this feature :/
Hi! This was recently brought to my attention. Just flagging that this would likely impact our engineers who might be developing on different feature branches but on the same workflow repo, on our development environments (which currently only run on our on-prem infrastructure).
Impacting in a good or bad way?
Hi @pditommaso impact in a bad way, I'm afraid! Our current idea for developing workflows within our organisation is for engineers to have their own branch in a workflow repository. They would implement changes in their own branch, and potentially run said workflows on our on-prem infrastructure to test their implementations. I believe that due to this bug, the engineers would end up over-writing each other's workflow implementations, if multiple implementation of the same workflow are tested at the same time?
Understand, but it's not a bug. Nextflow has always worked in this way. The goal of this issue is exactly to overcome this limitation
Don't know how the solution to this issue will be implemented, but don't forget (see https://github.com/nextflow-io/nextflow/issues/2655#issuecomment-1232941807) the use case where a developer has their own repository (outside of Nextflow's built-in integration of pull and run commands) and switches between branches during the execution of a pipeline. The solution to this issue should be that the execution shouldn't be affected by modifications of the original repository. Thanks :-)
Hi! This issue has just come up again at Genomics England as it is likely that our engineers would want to run different branches of the workflows simultaneously. Is there any chance that this is getting implemented soon?
Hey Luke, we are planning to implement this but no set timeline yet.
Indeed, it is something to prioritize. Tagging @marcodelapierre for visibility
Paolo I have found a git functionality for this.
Let's bash
code:
# for ease of description
ROOT_DIR="/path/to/.nextflow/assets"
repo="nextflow-io/hello"
revision="rocket"
def_remote="origin"
# user
nextflow run $repo -r $revision
# behind the scenes
# only if revision is not there already
if [ ! -d $ROOT_DIR/$repo/$revision ] ; then
# first revision requested
if [ ! -d $ROOT_DIR/$repo ] ; then
mkdir -p $ROOT_DIR/$repo/first
git clone -b $revision https://github.com/$repo $ROOT_DIR/$repo/first
cd $ROOT_DIR/$repo/first
def_branch=$( git remote show $def_remote | sed -n '/HEAD branch/s/.*: //p' )
cd ..
mv first $def_branch
ln -s $def_branch first_branch
# additional revision
else
cd $ROOT_DIR/$repo/first_branch
git worktree add --track -b $revision ../$revision $def_remote/$revision
fi
fi
The key functionality is this one:
git worktree add --track -b dsl2 ../dsl2 origin/dsl2
Docs: https://git-scm.com/docs/git-worktree
Found here: https://stackoverflow.com/questions/2048470/git-working-on-two-branches-simultaneously And also here: https://stackoverflow.com/questions/6270193/how-can-i-have-multiple-working-directories-with-git/30185564#30185564
What do you think?
If you like it, I can give it a shot myself, soon after I have worked on another couple of pending work items.
Forgot to mention the key advantage: only the repo file tree is duplicated, whereas all the Git related files such as in .git/
exist only once
Never too old to learn a new git subcommand 😆
Never too old to learn a new git subcommand 😆
indeed! this is a clear mark of our young ages...!! 😂
@pditommaso keen on your take on my proposed solution before I work on the implementation
This is indeed an excellent idea. This could simplify the solution compared to the use of the bare repository approach.
Using the worktree solution, the main/master checkout should remain in the current location. Instead, when -r <revision>
is requested it should be created a new work-tree under the path $NXF_ASSETS/revisions/<unique-id>
, where unique-id
is computed as sipHash24 of Project URI + revision.
Likely use of the --detach
flag can also be useful.
maybe this path for non-master revsions: $NXF_ASSETS/revisions/$repo/$revision
Apol @pditommaso , had to prioritise other activities with larger customer impact.
I am keen to get this one done, on top of my list for when I am back in January.
Ideally, the worktree should be checked out with all submodules recursively cloned, or there should be an option to do so. But if this complicates things, can be left for a later release.
Thanks a lot for working on this!
Working on it.
Turns out that the eclipse.jgit
project we currently rely on does not support git worktree
; there is a [PR (https://bugs.eclipse.org/bugs/show_bug.cgi?id=477475), that has been open for years to only add support to manage existing worktrees, not even to create new ones.
Proposed steps for way forward:
.git
to save disk space;jgit
(if any) that have wider git support (the main advantage of worktree
is indeed avoiding the .git
duplicates, so I don't think this step needs exploring).At this stage, I believe 1. can already be good enough. In its basic implementation it would duplicate the .git
files; however, is a local collection of revisions of a pipeline very much different from one of multiple pipelines?
So, going to proceed with 1. to begin with.
Just double checking if this feature has been implemented. I cannot find a link to any doc clearly indicating this feature is now working. Thank you.
Summary
Nextflow relies on built-in integration with Git to pull and run a workflow.
When the user specifies the Git repository URL on then run command line, Nextflow carry out a Git clone command, stores the pipeline code into the
$HOME/.nextflow/assets
directory and launch the execution from there.When the user specifies the
-r
(revision) CLI option, the repository is checked out at the specified revision ie. branch, tag or even commit id.This however poses a problem when if two or more users run different versions at the same time, because the last performing the operation would override the previous repository code, which could be a disruptive operation.
This is not such an unlikely event considering a pipeline execution can last for hours or even days.
To mitigate this problem nextflow refuses to perform a run if the project is currently checkout to a non-default version and the
run
does not specify the revision to be executed in an explicit manner. However, this is the cause of other unexpected side effects. See here.Goal
The goal of this enhancement is to allow the concurrent use of multiple pipeline revision in the same computer and deprecated the need for the stick revision check.
This could be achieved by downloading the Git repository with bare clone instead of a normal clone, and checkout the work tree into a separate subdirectory named as the commit id associated with the specified revision.
For example, if the user runs
nextflow should clone the repo above with the bare option and store in the path
$HOME/.nextflow/assets/nextflow-io/hello.git
Then implicitly the default branch is checkout, therefore the associate commit should be retrieved e.g.
4eab81bd42eed592f4371cd91b755ec78df25fe9
, therefore the following path should be created containing the work tree accessible for the executionWhen the user-specified a different revision e.g.
A new subdirectory with the corresponding commit id should be created.
The commit id should be resolved against the local git clone, unless the
-latest
option is specified.