moiexpositoalonsolab / grenepipe

A flexible, scalable, and reproducible pipeline to automate variant calling from raw sequence reads, with lots of bells and whistles.
http://grene-net.org
GNU General Public License v3.0
93 stars 21 forks source link

greenepipe run error #43

Closed ospfsg closed 6 months ago

ospfsg commented 7 months ago

I run a test dataset with 10 WGR samples and everything went fine.

After I run a dataset with 4 poolseq samples each with 4 files.

Run went smoothly but then I got this error message: any suggestion of what can be causing this problem?

 [Wed Mar 20 21:13:26 2024]
Error in rule mark_duplicates:
    jobid: 56
    output: dedup/PN1.bam, qc/dedup/PN1.metrics.txt, dedup/PN1.done
    log: logs/picard/dedup/PN1.log (check log file(s) for error message)
    conda-env: /home/dau1/software/conda-envs/287e3d61d4ee335d97bc039a6f3b8820

RuleException:
CalledProcessError in line 45 of /home/dau1/software/grenepipe-0.12.2/rules/duplicates-picard.smk:
Command 'source /home/dau1/miniconda3/envs/grenepipe/bin/activate '/home/dau1/software/conda-envs/287e3d61d4ee335d97bc039a6f3b8820'; set -euo pipefail;  python /mnt/data1/Project_QRO_Poolseq/Operational/4_data_analysis/5_grenepipe/run1/.snakemake/scripts/tmpuih01ava.wrapper.py' returned non-zero exit status 1.
  File "/home/dau1/miniconda3/envs/grenepipe/lib/python3.7/site-packages/snakemake/executors/__init__.py", line 2293, in run_wrapper
  File "/home/dau1/software/grenepipe-0.12.2/rules/duplicates-picard.smk", line 45, in __rule_mark_duplicates
  File "/home/dau1/miniconda3/envs/grenepipe/lib/python3.7/site-packages/snakemake/executors/__init__.py", line 568, in _callback
  File "/home/dau1/miniconda3/envs/grenepipe/lib/python3.7/concurrent/futures/thread.py", line 57, in run
  File "/home/dau1/miniconda3/envs/grenepipe/lib/python3.7/site-packages/snakemake/executors/__init__.py", line 554, in cached_or_run
  File "/home/dau1/miniconda3/envs/grenepipe/lib/python3.7/site-packages/snakemake/executors/__init__.py", line 2359, in run_wrapper

Traceback (most recent call last):
  File "/mnt/data1/Project_QRO_Poolseq/Operational/4_data_analysis/5_grenepipe/run1/.snakemake/scripts/tmpi5vnpg2i.wrapper.py", line 16, in <module>
    "picard MarkDuplicates {snakemake.params} INPUT={snakemake.input} "
  File "/home/dau1/miniconda3/envs/grenepipe/lib/python3.7/site-packages/snakemake/shell.py", line 231, in __new__
    raise sp.CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command 'set -euo pipefail; picard MarkDuplicates  REMOVE_DUPLICATES=true INPUT=mapped/PN2.merged.bam OUTPUT=dedup/PN2.bam METRICS_FILE=qc/dedup/PN2.metrics.txt  > logs/picard/dedup/PN2.log 2>&1' returned non-zero exit status 1. 

2024-03-19T122620.330658.snakemake.log

lczech commented 7 months ago

Hi Octávio @ospfsg,

thanks for posting this here again! :-)

This seems like an error in one of the tools (picard MarkDuplicates), which will need a bit of detective work to find. The first clue:

    log: logs/picard/dedup/PN1.log (check log file(s) for error message)

So could you please check that file and upload it here? It might already contain the error description we are looking for.

Also, please follow the steps of the troubleshooting part of the documentation. If you are not running this on a cluster, not all steps might be as described there (the job ID for instance), but it will give you a general idea which files to investigate to figure out where the error is coming from! Any log files that you find through this, you can also post here.

Cheers and so long Lucas

ospfsg commented 7 months ago

Hi Lucas

When I open the log file

MarkDuplicates -REMOVE_DUPLICATES true -INPUT mapped/PN1.merged.bam -OUTPUT dedup/PN1.bam -METRICS_FILE qc/dedup/PN1.metrics.txt

and no output file is present and qc/dedup folder is not there!

PN1.log

In the log file:

[Wed Mar 20 20:38:55 WET 2024] Executing as dau1@frey on Linux 6.5.0-26-generic amd64; OpenJDK 64-Bit Server VM 21.0.2-internal-adhoc.conda.src; Deflater: Intel; Inflater: Intel; Provider GCS is not available; Picard version: 2.27.4-SNAPSHOT

It seems this the problem? ..... Provider GCS is not available

cheers osp

lczech commented 7 months ago

Hi Octávio @ospfsg,

thanks for proving the log file. The error usually is the last thing to be logged, which is also the case here. At the end of the file, there is a log entry:

Exception in thread "main" java.lang.OutOfMemoryError: Java heap space

which is the issue here: You are running out of memory for Java. Java is a bit weird, and limits itself in terms of memory unless specified otherwise. So, in order to increase the amount of memory that Java will use, grenepipe provides an option to set Java-specific settings for each Java-based tool.

In your case, this is this line in the config file. By setting this to

MarkDuplicates-java-opts: "-Xmx10g"

you should give Java enough memory to work with.

Let me know if that works, and so long Lucas

ospfsg commented 7 months ago

Thank you

osp

lczech commented 6 months ago

Closed as per https://github.com/moiexpositoalonsolab/grenepipe/issues/44#issuecomment-2041463749