vinisalazar / metaphor

Metaphor: a general-purpose workflow for assembly and binning of metagenomes
https://metaphor-workflow.readthedocs.io/
Other
35 stars 3 forks source link

Error in rule fastqc_raw #62

Closed s-junguy closed 4 months ago

s-junguy commented 9 months ago

Hi, I testet metaphor on 2 libraries. I use Illumina paired end libraries (10 GB per library) and I submitted with: metaphor execute -y This is the error message: [Tue Sep 26 16:10:01 2023] rule fastqc_raw: input: /home/usr/testfastq/P19752_101_S33_L002_R2_001.fastq.gz output: output/qc/fastqc/P19752_101_S33_L002-single_unit-R2-raw_fastqc.zip, output/qc/fastqc/P19752_101_S33_L002-single_unit-R2-raw.html log: output/logs/qc/fastqc_raw/P19752_101_S33_L002-single_unit-R2.log jobid: 3 benchmark: output/benchmarks/qc/fastqc_raw/P19752_101_S33_L002-single_unit-R2.txt reason: Missing output files: output/qc/fastqc/P19752_101_S33_L002-single_unit-R2-raw_fastqc.zip wildcards: sample=P19752_101_S33_L002, unit=single_unit, read=R2 threads: 2 resources: tmpdir=/tmp

python -c "from future import print_function; import sys, json; print(json.dumps([sys.version_info.major, sys.version_info.minor]))" python -c "from future import print_function; import sys, json; print(json.dumps([sys.version_info.major, sys.versioninfo.minor]))" Activating conda environment: miniconda3/envs/metaphor/lib/python3.11/site-packages/metaphor/config/conda/9fc2cb0bc9eeb5e46ac96b6b3549363f Activating conda environment: miniconda3/envs/metaphor/lib/python3.11/site-packages/metaphor/config/conda/9fc2cb0bc9eeb5e46ac96b6b3549363f python /home/usr/.snakemake/scripts/tmphx2wig13.wrapper.py Activating conda environment: miniconda3/envs/metaphor/lib/python3.11/site-packages/metaphor/config/conda/9fc2cb0bc9eeb5e46ac96b6b3549363f python /home/usr/.snakemake/scripts/tmpgt504wex.wrapper.py Activating conda environment: miniconda3/envs/metaphor/lib/python3.11/site-packages/metaphor/config/conda/9fc2cb0bc9eeb5e46ac96b6b3549363f_ fastqc --threads 2 --memory 102 --outdir /tmp/tmpby7ct3oh /home/usr/testfastq/P19752_102_S34_L002_R1_001.fastq.gz > output/logs/qc/fastqc_raw/P19752_102_S34_L002-single_unit-R1.log 2>&1 fastqc --threads 2 --memory 102 --outdir /tmp/tmpigd4l4nc /home/usr/testfastq/P19752_101_S33_L002_R2_001.fastq.gz > output/logs/qc/fastqc_raw/P19752_101_S33_L002-single_unit-R2.log 2>&1 Traceback (most recent call last): File "/home/usr/.snakemake/scripts/tmpgt504wex.wrapper.py", line 49, in shell( File "/home/usr/miniconda3/envs/metaphor/lib/python3.11/site-packages/snakemake/shell.py", line 294, in new raise sp.CalledProcessError(retcode, cmd) subprocess.CalledProcessError: Command 'set -euo pipefail; fastqc --threads 2 --memory 102 --outdir /tmp/tmpigd4l4nc /home/usr/testfastq/P19752_101_S33_L002_R2_001.fastq.gz > output/logs/qc/fastqc_raw/P19752_101_S33_L002-single_unit-R2.log 2>&1' returned non-zero exit status 3. [Tue Sep 26 16:21:53 2023] Error in rule fastqc_raw: jobid: 3 input: /home/usr/testfastq/P19752_101_S33_L002_R2_001.fastq.gz output: output/qc/fastqc/P19752_101_S33_L002-single_unit-R2-raw_fastqc.zip, output/qc/fastqc/P19752_101_S33_L002-single_unit-R2-raw.html log: output/logs/qc/fastqc_raw/P19752_101_S33_L002-singleunit-R2.log (check log file(s) for error details) conda-env: /home/usr/miniconda3/envs/metaphor/lib/python3.11/site-packages/metaphor/config/conda/9fc2cb0bc9eeb5e46ac96b6b3549363f

RuleException: CalledProcessError in file /home/usr/miniconda3/envs/metaphor/lib/python3.11/site-packages/metaphor/workflow/rules/qc.smk, line 121: Command 'source /home/usr/miniconda3/bin/activate '/home/usr/miniconda3/envs/metaphor/lib/python3.11/site-packages/metaphor/config/conda/9fc2cb0bc9eeb5e46ac96b6b3549363f_'; set -euo pipefail; python /home/usr/.snakemake/scripts/tmpgt504wex.wrapper.py' returned non-zero exit status 1. File "/home/usr/miniconda3/envs/metaphor/lib/python3.11/site-packages/metaphor/workflow/rules/qc.smk", line 121, in rule_fastqc_raw File "/home/usr/miniconda3/envs/metaphor/lib/python3.11/concurrent/futures/thread.py", line 58, in run Traceback (most recent call last): File "/home/usr/.snakemake/scripts/tmphx2wig13.wrapper.py", line 49, in shell( File "/home/usr/miniconda3/envs/metaphor/lib/python3.11/site-packages/snakemake/shell.py", line 294, in new__ raise sp.CalledProcessError(retcode, cmd) subprocess.CalledProcessError: Command 'set -euo pipefail; fastqc --threads 2 --memory 102 --outdir /tmp/tmpby7ct3oh /home/usr/testfastq/P19752_102_S34_L002_R1_001.fastq.gz > output/logs/qc/fastqc_raw/P19752_102_S34_L002-single_unit-R1.log 2>&1' returned non-zero exit status 3. [Tue Sep 26 16:24:07 2023] Error in rule fastqc_raw: jobid: 4 input: /home/usr/testfastq/P19752_102_S34_L002_R1_001.fastq.gz output: output/qc/fastqc/P19752_102_S34_L002-single_unit-R1-raw_fastqc.zip, output/qc/fastqc/P19752_102_S34_L002-single_unit-R1-raw.html log: output/logs/qc/fastqc_raw/P19752_102_S34_L002-singleunit-R1.log (check log file(s) for error details) conda-env: /home/usr/miniconda3/envs/metaphor/lib/python3.11/site-packages/metaphor/config/conda/9fc2cb0bc9eeb5e46ac96b6b3549363f

RuleException: CalledProcessError in file /home/usr/miniconda3/envs/metaphor/lib/python3.11/site-packages/metaphor/workflow/rules/qc.smk, line 121: Command 'source /home/usr/miniconda3/bin/activate '/home/usr/miniconda3/envs/metaphor/lib/python3.11/site-packages/metaphor/config/conda/9fc2cb0bc9eeb5e46ac96b6b3549363f_'; set -euo pipefail; python /home/usr/.snakemake/scripts/tmphx2wig13.wrapper.py' returned non-zero exit status 1. File "/home/usr/miniconda3/envs/metaphor/lib/python3.11/site-packages/metaphor/workflow/rules/qc.smk", line 121, in __rule_fastqc_raw File "/home/usr/miniconda3/envs/metaphor/lib/python3.11/concurrent/futures/thread.py", line 58, in run Shutting down, this might take some time. Exiting because a job execution failed. Look above for error message Complete log: .snakemake/log/2023-09-26T160740.492022.snakemake.log Traceback (most recent call last): File "/home/usr/miniconda3/envs/metaphor/bin/metaphor", line 10, in sys.exit(main()) ^^^^^^ File "/home/usr/miniconda3/envs/metaphor/lib/python3.11/site-packages/metaphor/cli/cli.py", line 278, in main args.func(args) File "/home/usr/miniconda3/envs/metaphor/lib/python3.11/site-packages/metaphor/cli/execute.py", line 109, in main retcode = run_cmd(cmd) ^^^^^^^^^^^^ File "/home/usr/miniconda3/envs/metaphor/lib/python3.11/site-packages/metaphor/utils.py", line 92, in run_cmd retcode = check_call(cmd.split()) ^^^^^^^^^^^^^^^^^^^^^^^ File "/home/usr/miniconda3/envs/metaphor/lib/python3.11/subprocess.py", line 413, in check_call raise CalledProcessError(retcode, cmd) subprocess.CalledProcessError: Command '['snakemake', '--snakefile', '/home/usr/miniconda3/envs/metaphor/lib/python3.11/site-packages/metaphor/workflow/Snakefile', '--configfile', 'metaphor_settings.yaml', '--cores', '4', '-p', '-r', '--use-conda', '--wrapper-prefix', 'https://github.com/snakemake/snakemake-wrappers/raw/', '--conda-prefix', '/home/usr/miniconda3/envs/metaphor/lib/python3.11/site-packages/metaphor/config/conda', '--config', 'samples=samples.csv', '--config', 'max_mb=180000']' returned non-zero exit status 1.

vinisalazar commented 9 months ago

Hello @s-junguy,

Were you able to run the metaphor test command? What kind of system are you using for Metaphor, is it personal or shared? Do any rules complete their execution before fastqc_raw?

Thank you, Vini

s-junguy commented 9 months ago

Yes, the metaphor test command worked. I am using Ubuntu 22.04 LTS, release 22.04, shared system. There are no rules completing before fastqc_raw.

vinisalazar commented 9 months ago

Hi @s-junguy,

Thank you for clarifying. If the test command works, that indicates to me that this is not a Metaphor problem. If no rules run before fastqc_raw, that indicates that the problem is not in the rule itself, but rather in the Metaphor execution as a whole. However, it is difficult to pinpoint the exact cause of the problem.

Do the fastq_raw log files, e.g. output/logs/qc/fastqc_raw/P19752_101_S33_L002-single_unit-R2.log show anything? Are you able to activate the environment and run fastqc by itself? You can try this:

conda activate /home/usr/miniconda3/envs/metaphor/lib/python3.11/site-packages/metaphor/config/conda/9fc2cb0bc9eeb5e46ac96b6b3549363f_

fastqc /home/usr/testfastq/P19752_102_S34_L002_R1_001.fastq.gz

Does that work?

Best, Vini

KatharinaHoff commented 9 months ago

The frustrating part is: it works when we call it manually.


fastqc --threads 2 --memory 102  --outdir /tmp/tmpby7ct3oh /home/s-user/testfastq/P19752_102_S34_L002_R1_001.fastq.gz  > output/logs/qc/fastqc_raw/P19752_102_S34_L002-single_unit-R1.log

Started analysis of P19752_102_S34_L002_R1_001.fastq.gz
Approx 5% complete for P19752_102_S34_L002_R1_001.fastq.gz
Approx 10% complete for P19752_102_S34_L002_R1_001.fastq.gz
Approx 15% complete for P19752_102_S34_L002_R1_001.fastq.gz
Approx 20% complete for P19752_102_S34_L002_R1_001.fastq.gz
Approx 25% complete for P19752_102_S34_L002_R1_001.fastq.gz
Approx 30% complete for P19752_102_S34_L002_R1_001.fastq.gz
Approx 35% complete for P19752_102_S34_L002_R1_001.fastq.gz
Approx 40% complete for P19752_102_S34_L002_R1_001.fastq.gz
Approx 45% complete for P19752_102_S34_L002_R1_001.fastq.gz
Approx 50% complete for P19752_102_S34_L002_R1_001.fastq.gz
Approx 55% complete for P19752_102_S34_L002_R1_001.fastq.gz
Approx 60% complete for P19752_102_S34_L002_R1_001.fastq.gz
Approx 65% complete for P19752_102_S34_L002_R1_001.fastq.gz
Approx 70% complete for P19752_102_S34_L002_R1_001.fastq.gz
Approx 75% complete for P19752_102_S34_L002_R1_001.fastq.gz
Approx 80% complete for P19752_102_S34_L002_R1_001.fastq.gz
Approx 85% complete for P19752_102_S34_L002_R1_001.fastq.gz
Approx 90% complete for P19752_102_S34_L002_R1_001.fastq.gz
Approx 95% complete for P19752_102_S34_L002_R1_001.fastq.gz

There was no error message. The return code was 3. We have no idea why fastqc returns a 3 :-(

printf '%d\n' $?
3

Weirdly, the output directory is empty. Shouldn't fastqc write something? Oh no, we now see it's a java heapspace problem!

less output/logs/qc/fastqc_raw/P19752_102_S34_L002-single_unit-R1.log
application/gzip
Terminating due to java.lang.OutOfMemoryError: Java heap space

Fastqc is version FastQC v0.12.1.

KatharinaHoff commented 9 months ago

The job completes with increased memory. (I am pretty sure it's not about the threads, see issue https://github.com/s-andrews/FastQC/issues/86 )

fastqc --threads 24 --memory 402  --outdir /tmp/tmpby7ct3oh /home/s-user/testfastq/P19752_102_S34_L002_R1_001.fastq.gz  > output/logs/qc/fastqc_raw/P19752_102_S34_L002-single_unit-R1.log
Started analysis of P19752_102_S34_L002_R1_001.fastq.gz
Approx 5% complete for P19752_102_S34_L002_R1_001.fastq.gz
Approx 10% complete for P19752_102_S34_L002_R1_001.fastq.gz
Approx 15% complete for P19752_102_S34_L002_R1_001.fastq.gz
Approx 20% complete for P19752_102_S34_L002_R1_001.fastq.gz
Approx 25% complete for P19752_102_S34_L002_R1_001.fastq.gz
Approx 30% complete for P19752_102_S34_L002_R1_001.fastq.gz
Approx 35% complete for P19752_102_S34_L002_R1_001.fastq.gz
Approx 40% complete for P19752_102_S34_L002_R1_001.fastq.gz
Approx 45% complete for P19752_102_S34_L002_R1_001.fastq.gz
Approx 50% complete for P19752_102_S34_L002_R1_001.fastq.gz
Approx 55% complete for P19752_102_S34_L002_R1_001.fastq.gz
Approx 60% complete for P19752_102_S34_L002_R1_001.fastq.gz
Approx 65% complete for P19752_102_S34_L002_R1_001.fastq.gz
Approx 70% complete for P19752_102_S34_L002_R1_001.fastq.gz
Approx 75% complete for P19752_102_S34_L002_R1_001.fastq.gz
Approx 80% complete for P19752_102_S34_L002_R1_001.fastq.gz
Approx 85% complete for P19752_102_S34_L002_R1_001.fastq.gz
Approx 90% complete for P19752_102_S34_L002_R1_001.fastq.gz
Approx 95% complete for P19752_102_S34_L002_R1_001.fastq.gz
Approx 100% complete for P19752_102_S34_L002_R1_001.fastq.gz
less output/logs/qc/fastqc_raw/P19752_102_S34_L002-single_unit-R1.log
application/gzip
Analysis complete for P19752_102_S34_L002_R1_001.fastq.gz

How do we tell metaphor systematically to increase memory for this step?

vinisalazar commented 9 months ago

Hi @KatharinaHoff,

Thank you for the details. Metaphor currently doesn't support specifying an arbitrary amount of memory to FastQC. It would take me a couple of days to release a patch with that feature. In the meantime, it's possible to disable FastQC by editing the metaphor_settings.yaml file and setting fastqc and multiqc to false.

However, if you'd like to try it yourself, to increase memory of the FastQC steps, you can edit this file:

ls $(metaphor config show --metaphor)/workflow/rules/qc.smk

Add a resources directive with mem_mb to the FastQC rules, as shown here: https://snakemake-wrappers.readthedocs.io/en/stable/wrappers/fastqc.html

Best, Vini

s-junguy commented 8 months ago

I added a resources directive with mem_mb to the FastQC rules as you suggested but I get a syntax error.

rule fastqc_raw:  # qc on raw, unmerged reads
    input:
        get_fastqc_input_raw,
    output:
        zip="output/qc/fastqc/{sample}-{unit}-{read}-raw_fastqc.zip",
        html="output/qc/fastqc/{sample}-{unit}-{read}-raw.html",
    params:
        "--quiet",
    wildcard_constraints:
        sample="|".join(sample_IDs),
        unit="|".join(unit_names),
        read="single|R1|R2",
    log:
        "output/logs/qc/fastqc_raw/{sample}-{unit}-{read}.log",
    benchmark:
        "output/benchmarks/qc/fastqc_raw/{sample}-{unit}-{read}.txt"
    threads: get_threads_per_task_size("medium")
    resources:
        mem_mb=250000
    wrapper:
        get_wrapper("fastqc")
SyntaxError in file /home/user/miniconda3/envs/metaphor/lib/python3.11/site-packages/metaphor/workflow/rules/qc.smk, line 121:
Unexpected keyword mem_mb in rule definition (qc.smk, line 121)
vinisalazar commented 8 months ago

I believe this may be a formatting issue. Snakemake seems to be parsing "mem_mb" as a rule keyword, when in fact it is supposed to be under the resources keyword. However, the formatting does seem correct to me in the snippet you pasted. Could you please double check the file you edited? The snakefmt tool may also be helpful to ensure correct formatting.

vinisalazar commented 4 months ago

I'll be closing this issue for now, but don't hesitate to reopen it if needed.