Open ArtemSokolov opened 4 years ago
Temporary workaround: use -resume
feature.
The outputs are actually properly produced by the quantification module, but nextflow looks for them before SLURM finishes writing them to the working directory. As a result, the pipeline terminates, but the files eventually appear in the workdir. Thus, -resume
will detect the presence of those output files and will treat them as the process cache.
This was observed to happen with ASHLAR as well.
Issue reported to nextflow devs: https://github.com/nextflow-io/nextflow/issues/1644
Unfortunately, since it's somewhat intermittent and difficult to reproduce consistently, it may be a while before this is fully resolved.
We have seen happen this when the shared file system has an aggressive caching strategy and therefore the remote node (running nextflow) is not able to detect the files that have been created by the compute node, in particular the .exitcode
.
A possible solution consists to increase the exitReadTimeout
timeout to a higher value. See here for details https://www.nextflow.io/docs/latest/config.html#scope-executor
Thank you, Paolo. We'll play around with it.
There appears to be a repeating problem with quantification jobs finishing but getting detected as failures, with the following error:
The
python
process finishes and produced expected output in the corresponding work directory. However, the file never gets published toquantification/
, because nextflow detects (or fails to detect) something and terminates the entire pipeline run.Possible explanation:
.exitcode
getting written to scratch3 before the output files, causing nextflow to look for output files that are not there yet (as described in https://github.com/nextflow-io/nextflow/issues/931) Starting point for a possible minimal reproducible example: core66.tif
from TMA11