Open eHirchaud opened 3 years ago
Have exact same issue. NFS and SLURM. Snakemake failing with latency error. Is there a fix for this issue?
I have the same exact issue...
Same problem here. Also NFS and SLURM. For me it's not even the output of a rule, but the very first input files in the pipeline.
Ok, it looks like my issue was actually #1527 which was fixed a week ago.
@eHirchaud You are setting actimeo
to 1800, so 30 minutes, and you did not change lookupcache
from it's default all
, so no nfs lookups for the missing files will happen until the attributes expire.
You should set latency-seconds
to the same value as actimeo
, or a bit higher to be safe. Set actimeo
to as long as you are willing to wait for a failed job. I'd suggest no more than 30 seconds.
@johanneskoester You can close this. The issue of OP was NFS settings. Only option is to make the exception more verbose. Short of trying to query kernel for timeout settings to set latency-wait defaults accordingly.
Snakemake version
Describe the bug Job SUCCESS with slurm but Snakemake waiting for file very long time and fail. Latency-wait option up to 420. We reproduce this bug on 2 differents clusters. We don't understand why it is so long.
Logs Error (but files exists):
Without --cluster-status.py Snakemake finish well but elapsed time is about 1h30 ! We try in local and the workflows make 1m30...
Minimal example
snakemake -s Snakefile_test --use-conda --cluster "sbatch -A plucas -p bioinfo --cpus-per-task=1 --parsable" --stats stats_saturn.json --latency-wait 420 --cluster-status ./status.py -p -j 2
Snakefile contains one rule to use trimmomatic on fastq.gz file.
Additional context