Error in mapping on cluster

AxelVaillant commented 2 years ago

Hello, I'm trying to run the pipeline on a cluster but i keep getting an error on the job mapping. it seems that the error is related to the wrapper "0.80.0/bio/bwa/mem" in mapping-bwa-mem.smk i tried to upgrade the memory capacity in cluster_config.yaml but it didn't work.

I am launching the pipeline with these options : snakemake --conda-frontend mamba --conda-prefix ~/scratch/conda-envs --profile profiles/slurm/ --directory ../OutputGrenepipe

The error message is the following :

 Traceback (most recent call last):
  File "/lustre/vaillanta/OutputGrenepipe/.snakemake/scripts/tmpm04ks8of.wrapper.py", line 79, in <module>
    " | " + pipe_cmd + ") {log}"
  File "/home/vaillanta/miniconda3/envs/grenepipe/lib/python3.7/site-packages/snakemake/shell.py", line 231, in __new__
    raise sp.CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command 'set -euo pipefail;  (bwa mem -t 12 -R '@RG\tID:ARP-28-c20_S343\tSM:ARP-28-c20_S343\tPL:-'  /lustre/vaillanta/grenepipe/TAIR10_chr_all.fa trimmed/ARP-28-c20_S343-1.1.fastq.gz trimmed/ARP-28-c20_S343-1.2.fastq.gz | samtools sort -T /tmp/tmpsdzuf3w8 -m 4G -o mapped/ARP-28-c20_S343-1.sorted.bam -)  2> logs/bwa-mem/ARP-28-c20_S343-1.log' returned non-zero exit status 1.
[Mon Nov  7 19:10:21 2022]
Error in rule map_reads:
    jobid: 0
    output: mapped/ARP-28-c20_S343-1.sorted.bam, mapped/ARP-28-c20_S343-1.sorted.done
    log: logs/bwa-mem/ARP-28-c20_S343-1.log (check log file(s) for error message)
    conda-env: /home/vaillanta/scratch/conda-envs/7141f65285b636cb7f62b59835a41269

RuleException:
CalledProcessError in line 58 of /lustre/vaillanta/grenepipe/rules/mapping-bwa-mem.smk:
Command 'source /home/vaillanta/miniconda3/envs/grenepipe/bin/activate '/home/vaillanta/scratch/conda-envs/7141f65285b636cb7f62b59835a41269'; set -euo pipefail;  python /lustre/vaillanta/OutputGrenepipe/.snakemake/scripts/tmpm04ks8of.wrapper.py' returned non-zero exit status 1.
  File "/home/vaillanta/miniconda3/envs/grenepipe/lib/python3.7/site-packages/snakemake/executors/__init__.py", line 2293, in run_wrapper
  File "/lustre/vaillanta/grenepipe/rules/mapping-bwa-mem.smk", line 58, in __rule_map_reads
  File "/home/vaillanta/miniconda3/envs/grenepipe/lib/python3.7/site-packages/snakemake/executors/__init__.py", line 568, in _callback
 File "/home/vaillanta/miniconda3/envs/grenepipe/lib/python3.7/concurrent/futures/thread.py", line 57, in run
  File "/home/vaillanta/miniconda3/envs/grenepipe/lib/python3.7/site-packages/snakemake/executors/__init__.py", line 554, in cached_or_run
  File "/home/vaillanta/miniconda3/envs/grenepipe/lib/python3.7/site-packages/snakemake/executors/__init__.py", line 2359, in run_wrapper
Shutting down, this might take some time.
Exiting because a job execution failed. Look above for error message
slurmstepd: error: Detected 1 oom-kill event(s) in StepId=11817887.batch. Some of your processes may have been killed by the cgroup out-of-memory handler.

Thank you !

lczech commented 2 years ago

Hi @AxelVaillant,

the error you are getting is still an out-of-memory problem, so the only two solutions are to either increase the memory even more, or to try another mapping tool that grenepipe offers.

To what value have you increased your memory? I have worked with datasets where 25GB was needed (or more, can't quite remember). If you think that the new memory that you are setting is somehow not used by the pipeline, you can also share your cluster_config.yaml here, to see if everything is all right with that file.

Cheers and so long Lucas

AxelVaillant commented 2 years ago

Ok this time i increased the memory limit to 25G and i still get the exact same error but without the last line "slurmstepd error oom-kill event.."

Here is the content of my cluster_config.yaml :

__default__:
  time: 600 # Default time (minutes). A time limit of zero requests that no time limit be imposed
  mem: 25G # Default memory. A memory size specification of zero grants the job access to all of the memory on each node.
  cpus-per-task: 1
  nodes: 1
  ntasks: 1
  account: arabreed
  partition: tests
trim_reads_se:
  mem: 25G
  cpus-per-task: 4

trim_reads_pe:
  mem: 25G
  cpus-per-task: 4

map_reads:
  meme: 25G
  cpus-per-task: 4

call_variants:
  time: 1-0
  cpus-per-task: 4

lczech commented 2 years ago

Okay, that file looks all right. If you don't get the out of memory error any more, it might be something else (or still out of memory, but somehow that last line does not get printed). Have you checked the log file produced by the mapping itself, logs/bwa-mem/ARP-28-c20_S343-1.log?

Edit: See also the troubleshooting page for other things you can check out. If tracking down the log files does not reveal the error, you can also try to run bwa mem directly with the file that is causing trouble, and see if that works - that would at least tell you whether the problem is with bwa mem and/or your files, or with grenepipe.

This all can be quite tricky, but as said in the troubleshooting page, it's a necessary evil that comes from trying to string together many different tools with their own little problems, that in combination can cause a lot of different issues... :-(

lczech commented 1 year ago

Hi @AxelVaillant, any update on this?

AxelVaillant commented 1 year ago

Hi, Unfortunately i didn't manage to solve my problems so i gave up to execute the pipeline on a cluster. Anyway, thank you for your help !

lczech commented 1 year ago

Hi @AxelVaillant,

I am sorry to hear! If you have a moment, I'd be interested in a bit of feedback, in order to improve grenepipe: Was this still due to the errors above? Have you tried running the tool causing the error on its own (outside of grenepipe) to check if that works? From what I can see above, it was just an out-of-memory issue, so hopefully fixable (unless your cluster does not offer enough memory, but that seems unlikely).

If you have any suggestions on what needs to be fixed in grenepipe to get this to work for you (if this is due to grenepipe), I'd be grateful!

Cheers, thank you, and so long Lucas

moiexpositoalonsolab / grenepipe

Error in mapping on cluster #27