nioo-knaw / epiGBS2

This is the epiGBS2 snakemake pipeline as published in a preprint version.
MIT License
2 stars 6 forks source link

Problem methylation_calling latency #14

Open alberto-rodriguezizquierdo opened 2 years ago

alberto-rodriguezizquierdo commented 2 years ago

Hi,

My name is Alberto Rodriguez. Actually, I'm working with your package to analyze data coming from a epiGBS2 experiment using your protocol published in Biorxiv.

I have the following message from the log:

MissingOutputException in line 136 of /home/arodriguez/epiGBS2/src/rules/denovo.rules: Job Missing files after 1000 seconds: /home/arodriguez/epiGBS2/output/methylation_calling/CHH_OT_RAYADA_MELONERA_3_trimmed_filt_merged.1_bismark_bt2_pe.txt /home/arodriguez/epiGBS2/output/methylation_calling/CHG_OT_RAYADA_MELONERA_3_trimmed_filt_merged.1_bismark_bt2_pe.txt /home/arodriguez/epiGBS2/output/methylation_calling/CpG_OT_RAYADA_MELONERA_3_trimmed_filt_merged.1_bismark_bt2_pe.txt This might be due to filesystem latency. If that is the case, consider to increase the wait time with --latency-wait. Job id: 847 completed successfully, but some output files are missing. 847 File "/home/arodriguez/miniconda3/envs/snake/lib/python3.10/site-packages/snakemake/executors/init.py", line 583, in handle_job_success File "/home/arodriguez/miniconda3/envs/snake/lib/python3.10/site-packages/snakemake/executors/init.py", line 252, in handle_job_success Removing output files of failed job methylation_calling_denovo_bismark since they might be corrupted: /home/arodriguez/epiGBS2/output/methylation_calling/RAYADA_MELONERA_3_trimmed_filt_merged.1_bismark_bt2_pe.CX_report.txt, /home/arodriguez/epiGBS2/output/methylation_calling/RAYADA_MELONERA_3_trimmed_filt_merged.1_bismark_bt2_pe.bismark.cov.gz, /home/arodriguez/epiGBS2/output/methylation_calling/CHH_OB_RAYADA_MELONERA_3_trimmed_filt_merged.1_bismark_bt2_pe.txt, /home/arodriguez/epiGBS2/output/methylation_calling/CHG_OB_RAYADA_MELONERA_3_trimmed_filt_merged.1_bismark_bt2_pe.txt, /home/arodriguez/epiGBS2/output/methylation_calling/CpG_OB_RAYADA_MELONERA_3_trimmed_filt_merged.1_bismark_bt2_pe.txt Shutting down, this might take some time. Exiting because a job execution failed. Look above for error message Complete log: /home/arodriguez/epiGBS2/.snakemake/log/2022-02-04T110740.662534.snakemake.log

I've tried changing the --latency-wait from 30 to 1000 s obtaining the same message. Could you suggest me how to solve that problem?

Thanks a lot!

Alberto.

MaartenPostuma commented 2 years ago

Hi Alberto, This issue indicates that snakemake can not find certain files that should've have been generated during a certain step in the pipeline. In this case the methylation_calling fails to produce the CHH_OT/CG_OT/CHG_OT files.

These files are quite large so my first guess would be to check if your system has enough disk space. If there is enough disk space I would recommend running the command outside the pipeline to see if you can get a more informative error message (see code below).

conda env create -f src/env/bismark.yaml -n bismark
conda activate bismark
bismark_methylation_extractor -p --CX --no_overlap --report --bedGraph --scaffolds --cytosine_report --genome_folder /home/arodriguez/epiGBS2/output/output_denovo/NNNNref/ /home/arodriguez/epiGBS2/output/alignment/RAYADA_MELONERA_3_trimmed_filt_merged.1_bismark_bt2_pe.bam -o /home/arodriguez/epiGBS2/output/methylation_calling/
alberto-rodriguezizquierdo commented 2 years ago

Hi Marteen,

Thank you for your reply! I've tried and it works!

Kind regards,

Alberto.

MWSchmid commented 2 years ago

Hi Maarten

Btw, the bismark methylation extractor is very slow with large files and generates a lot of temporary files while sorting. Would be something to improve. Maybe with MethylDackel:

MethylDackel extract --CHG --CHH -@ 2 refGenome.fasta sampleX.sorted.bam

Best,

Marc