simonlabcode / bam2bakR

2 stars 0 forks source link

MissingOutputException in rule maketdf (some output files are missing) #10

Closed ryuma-matsubara closed 6 months ago

ryuma-matsubara commented 6 months ago

Hi Isaac,

Thank you very much for developing a great tool. I've been using the tool and it's very useful. However, when I tried to run the pipeline with option 'TC,GA' to measure both mutations, I'm facing an error.
(If I run the same dataset with 'TC' counting option, it works fine) I'm running the pipeline on a cluster which uses Sun Grid Engine, with the following commands:

#$ -l s_vmem=100G
#$ -l mem_req=100G
#$ -pe def_slot 12

export OMP_NUM_THREADS=12
conda activate deploy_snakemake_2024
snakemake --unlock
snakemake --cores 12 --resources mem_mb=60000 --use-conda

The .snakemake/log/*log file ends with the following message:

MissingOutputException in rule maketdf in file https://raw.githubusercontent.com/simonlabcode/bam2bakR/main/workflow/rules/bam2bakr.smk, line 227:
Job 16  completed successfully, but some output files are missing. Missing files after 5 seconds. This might be due to filesystem latency. If that is the case, consider to increase the wait time with --latency-wait:
...(a lot of files are listed)
Removing output files of failed job maketdf since they might be corrupted:
results/tracks/cont_1_success.txt
Shutting down, this might take some time.
Exiting because a job execution failed. Look above for error message
Complete log: .snakemake/log/2024-03-25T184424.988718.snakemake.log
WorkflowError:
At least one job did not complete successfully.



I checked the previous issue posts, and added --cores 12 or --resources mem_mb=60000, but no luck.
Since it only happens with 'TC,GA' options, I assume there's some sort of incompatibility?

In the results/tracks/, there are several tmp files cont_1_GA_0__STARtmp ... cont_1_GA_5__STARtmp, so I assume the tdf step with STAR failed somehow. Even though the job was terminated, there is still a cB.csv.gz file generated. Is this cB.csv.gz still usable?


I attach the logfile generated in .snakemake/log/, as well as my config.yaml.
I'd appreciate your help very much. Thank you in advance.

-Ryuma

config.yaml.gz

2024-03-25T184424.988718.snakemake.log

isaacvock commented 6 months ago

Hi Ryuma,

Thank you for your kind words, I'm sorry you ran into troubles though.

I'll have a more detailed response in the morning, but I wanted to note two things before I call it for the night:

1) The cB should be usable. TDF creation is completely independent of cB creation and can fail for reasons that do not impact cB creation. 2) Can you provide the contents of the logs/make tdf folder? The .log files there will be crucial for diagnosing the problem.

Best, Isaac

isaacvock commented 6 months ago

Hi Ryuma,

Actually, it turned out to be an easy fix that I had implemented on a separate branch but had never pushed to the main branch. You should be able to create TC and GA tracks now; let me know if it still doesn't work for you.

Best, Isaac

ryuma-matsubara commented 6 months ago

It worked now. Fantastic!

Finished job 0.
16 of 16 steps (100%) done

By the way, though this is not a bug nor urgent, but it'd be great if you could include in the future update that it converts .sam to .bam, or compress the output/tmp files. Now I need ~400 Gb free storage for the temp file. But again this is not a bug report.

Thanks for your quick response, I appreciate it very much!

Cheers, Ryuma