Closed pdy1084 closed 3 weeks ago
Hi @pdy1084 there is sort of a lot to unpack here, and I think a little bit of conceptual confusion on your side, I hope I can be helpful though:
When you say you are unable to resume with the cache, do you mean you run once with an older version of mag, and then you try to resume but with a more recent version (i.e., running same command but updating to -r 3.0.3
), or you you mean you are unable to resume between different failed runs of the same -r 3.0.3
version?
I think conceptually you're mixing up different caches, and maybe why it's not resuming correctly:
-resume
functionality 'cache' (i.e. for previous steps of the pipeline) is stored in -work
. But if you `-resume in the same location as you ran the first command, it should pick this up. If you have changed the singularity cache, it'll store the images somewhere else and the pipeline might think everything has changed in the pipeline and will start form scratch You cannot/should not change the -w
working directory (by default set to wherever you execute your command) fore example, as that will cause -resume
to break.
Also I see you have a config file, have you set cleanup = true
in it? That would also delete all the files within the working directory, and cause -resume
to break
Hi @jfy133 ,
Thank you for your feedback about the issue. It is true that I had some confusion about the types of cache and how to manage them to resume the pipeline correctly.
Regarding your questions:
My case would be the first one, the one you described as "run once with an older version of mag, and then you try to resume but with a more recent version (i.e., running same command but updating to -r 3.0.3
)".
Now I understand that NXF_SINGULARITY_CACHE stores the software images but it is not pointing to the place where the images will be collected by the mag pipeline using -resume
.
So then I suppose that in the next run I will need to remove the line export NXF_SINGULARITY_CACHE="..."
and somehow I would need to tell Nf-core/mag the right folder where to collect the cache, right?
It seems that the third bullet point in your second comment could be the reason of the -resume
failure.
Finally I can confirm that I did not use the -w
flag to change the working directory neither I used cleanup = true
in the config file to delete all the files within the working directory.
I hope it is now easier to track and address the issue. Please let me know if you need more information.
Thanks for the clarifications!
To my knowledge, changing the entire verison e.g. by changing -r
will cause Nextflow to consider that the entire workflow is changed and cannot trust particular parts of the pipeline are the same - so negates the cache, thus indeed will start from scratch.
You'll just have to run from scratch in this case to compare, sorry about that.
Please feel free to reopen if that's not your observation of the behaviour
Description of the bug
Hello,
I have been running the Nf-core/mag pipeline for some time without any problem but recently I wanted to test one of the most recent stable releases (3.0.3). The problem however is that I cannot manage to make Nf-core/mag recognize the previous cache. Some of the steps were completed successfully in the previous runs, like for example the assembly with Megahit but all former outputs are not detected and the pipeline starts from scratch. (I know that this specific run failed due to memory allocation of Megahit but the main issue is that it should not be starting to run Megahit). In the last runs I started, the pipeline is starting from the beginning wanting to run the assembly again, as you can see in the execution trace (also in the image attached) -I filtered the execution traces doing grep NFCORE_MAG:MAG:MEGAHIT execution*txt-:
In the nf-core/mag command I added the -resume flag and I tried to set the NFX cache dir to the run folder where Megahit was cached (execution_trace_2024-08-28_17-27-28.txt:6), using 'export NXF_SINGULARITY_CACHEDIR=/path/to/sample/W1/work/9e/82eb6bc3b971b545bb014c51400678' However this does not seem to work either as the new execution trace shows MEGAHIT still failed and that any other output has been cached.
So how could I manage to assign the right cache dir if it does not seem to work for me with the environmental variable NXF_SINGULARITY_CACHEDIR? In addition, if I have around 20 more samples for which I should do the same process. Is there a way to automatize the step of assigning the proper directory for multiple cache runs, choosing the cache corresponding to the previous run that was more complete...?
In overall, how can I point in each Nf-core/mag run the folder cache which is the most complete for each specific sample?
Thank you very much.
Command used and terminal output
-----> Run command:
I am running the command from this directory: /path/to/sample/W1/output
-----> Error (Megahit is not cached):
Relevant files
nextflow.log
System information
nextflow version 24.04.4.5917 HPC slurm Singularity nf-core/mag 3.0.3
OS information: PRETTY_NAME="Ubuntu 22.04.4 LTS" NAME="Ubuntu" VERSION_ID="22.04" VERSION="22.04.4 LTS (Jammy Jellyfish)" VERSION_CODENAME=jammy ID=ubuntu ID_LIKE=debian