Open cflerin opened 3 years ago
After some digging, this is caused by the getOutputFileName
function:
https://github.com/vib-singlecell-nf/vsn-pipelines/blob/b5167f5b31129f51dece9a309dc20dc686612d10/src/utils/processes/utils.nf#L374-L406
since replacing the output of this function with a fixed string in the publish functions results in proper and consistent caching of these processes on resume. I think the problem could be due to the way getOutputFileName
is run inside of the publish process, creating inputs that are dynamic, and forcing the process to be re-executed every time.
As a side note, this is a major issue for disk space in larger (e.g. mapping) projects. I ran into an issue where pbs jobs were failing to be submitted late in the atac_preprocess
workflow. Each time I re-ran with -resume
all of the upstream published files (fastq, bam files, etc.) were copied again within work/
, leaving 100s of GBs of extra data.
I understand, this is quite annoying!
Calling getOutputFileName
seems to me deterministic.
Also if getOutputFileName
is the root of the issue, I would also expect scenic:PUBLISH_SCENIC:COMPRESS_HDF5
not to resume (since it is also calling this function) but it does.
Very intriguing this bug
I just noticed that the NXF
processes that do not resume are the ones using the outputFileName
variable in their publishDir
directive.
I just noticed that the
NXF
processes that do not resume are the ones using theoutputFileName
variable in theirpublishDir
directive.
Nice! That's an interesting point there. I hadn't tested COMPRESS_HDF5
.
COMPRESS_HDF5
still appears to cache properly when changing the publishDir
output to use outputFileName
.
Describe the bug When running the pipeline for a second time with the
-resume
option, the publish processes always run and are not cached.To Reproduce Steps to reproduce the behavior:
nextflow run vib-singlecell-nf/vsn-pipelines -profile scenic,test__scenic,singularity -entry scenic -r v0.21.0
nextflow run vib-singlecell-nf/vsn-pipelines -profile scenic,test__scenic,singularity -entry scenic -r v0.21.0 -resume
$ nextflow run vib-singlecell-nf/vsn-pipelines -profile scenic,testscenic,singularity -entry scenic -r v0.21.0 -resume N E X T F L O W ~ version 20.04.1 Launching
vib-singlecell-nf/vsn-pipelines
[reverent_faggin] - revision: 3cc43ce065 [v0.21.0] NOTE: Your local project version looks outdated - a different revision is available in the remote repository [b6577d79a5] WARN: DSL 2 IS AN EXPERIMENTAL FEATURE UNDER DEVELOPMENT -- SYNTAX MAY CHANGE IN FUTURE RELEASE executor > local (2) [ff/3b0342] process > scenic:SCENIC:ARBORETO_WITH_MULTIPROCESSING (1) [100%] 1 of 1, cached: 1 ✔ [3a/5b0e9f] process > scenic:SCENIC:CISTARGETMOTIF (1) [100%] 1 of 1, cached: 1 ✔ [58/e8f22b] process > scenic:SCENIC:AUCELLMOTIF (1) [100%] 1 of 1, cached: 1 ✔ [74/45ef7f] process > scenic:SCENIC:VISUALIZE (1) [100%] 1 of 1, cached: 1 ✔ [64/f374d2] process > scenic:SCENIC:PUBLISH_LOOM (1) [100%] 1 of 1, cached: 1 ✔ [81/6c7b8d] process > scenic:PUBLISH_SCENIC:COMPRESS_HDF5 (1) [100%] 1 of 1, cached: 1 ✔ [8c/19d3a4] process > scenic:PUBLISH_SCENIC:SC__PUBLISH (1) [100%] 1 of 1 ✔ [26/c75c86] process > scenic:PUBLISH_SCENIC:SCPUBLISH_PROXY (1) [100%] 1 of 1 ✔