umccr / umccrise

:snake: DRAGEN Tumor/Normal workflow post-processing
https://umccr.github.io/umccrise/
MIT License
22 stars 8 forks source link

BPI (break-point-inspector) OutOfMemoryError #88

Closed pdiakumis closed 2 years ago

pdiakumis commented 2 years ago

The ICA umccrise workflow runs out of memory in the sv_bpi_maybe step. The mem is set to 16G for that step based on https://github.com/umccr/umccrise/blob/be0c0282d9044a6d802941b4c7e6e85c32c36114/umccrise/structural.smk#L246

bcbio sets that max mem to 65G (-Xmx65520m), which would explain why we haven't encountered this in previous years. We should probably bump that to 30G for starters.

Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
    at java.base/java.util.TreeMap.put(TreeMap.java:575)
    at java.base/java.util.TreeSet.add(TreeSet.java:255)
    at com.google.common.collect.AbstractMapBasedMultimap.put(AbstractMapBasedMultimap.java:202)
    at com.google.common.collect.AbstractSetMultimap.put(AbstractSetMultimap.java:130)
    at com.google.common.collect.TreeMultimap.put(TreeMultimap.java:76)
    at com.hartwig.hmftools.breakpointinspector.clipping.Clipping.add(Clipping.java:61)
    at com.hartwig.hmftools.breakpointinspector.Analysis.lambda$null$16(Analysis.java:538)
    at com.hartwig.hmftools.breakpointinspector.Analysis$$Lambda$78/0x0000000800210440.accept(Unknown Source)
    at java.base/java.util.Spliterators$ArraySpliterator.forEachRemaining(Spliterators.java:948)
    at java.base/java.util.stream.ReferencePipeline$Head.forEach(ReferencePipeline.java:658)
    at com.hartwig.hmftools.breakpointinspector.Analysis.lambda$processStructuralVariant$17(Analysis.java:538)
    at com.hartwig.hmftools.breakpointinspector.Analysis$$Lambda$77/0x0000000800210040.accept(Unknown Source)
    at java.base/java.lang.Iterable.forEach(Iterable.java:75)
    at com.hartwig.hmftools.breakpointinspector.Analysis.processStructuralVariant(Analysis.java:538)
    at com.hartwig.hmftools.breakpointinspector.BreakPointInspectorApplication.main(BreakPointInspectorApplication.java:261)
[32m[Sun May  8 20:02:03 2022][0m
[31mError in rule sv_bpi_maybe:[0m
[31m    jobid: 0[0m
[31m    output: work/SBJ02100__PRJ221050/structural/maybe_bpi/SBJ02100__PRJ221050-manta.vcf[0m
[31m    log: log/structural/SBJ02100__PRJ221050/SBJ02100__PRJ221050-bpi_stats.txt (check log file(s) for error message)[0m
[31m[0m
[31mRuleException:
CalledProcessError in line 246 of /umccrise/umccrise/structural.smk:
Command 'break-point-inspector -Xms1000m -Xmx16000m -Djava.io.tmpdir=SBJ02100__PRJ221050/structural/maybe_bpi/tmp_dir -vcf work/SBJ02100__PRJ221050/structural/sv_subsample_if_too_many/SBJ02100__PRJ221050-manta.vcf -ref /scratch/inputs/PRJ221049/PRJ221049.bam -tumor /scratch/inputs/L2200523_L2200522_dragen/PRJ221050_tumor.bam -output_vcf work/SBJ02100__PRJ221050/structural/maybe_bpi/SBJ02100__PRJ221050-manta.vcf > log/structural/SBJ02100__PRJ221050/SBJ02100__PRJ221050-bpi_stats.txt' returned non-zero exit status 1.
  File "/miniconda/envs/umccrise/lib/python3.7/site-packages/snakemake/executors/__init__.py", line 2189, in run_wrapper
  File "/umccrise/umccrise/structural.smk", line 246, in __rule_sv_bpi_maybe
  File "/miniconda/envs/umccrise/lib/python3.7/site-packages/snakemake/executors/__init__.py", line 529, in _callback
  File "/miniconda/envs/umccrise/lib/python3.7/concurrent/futures/thread.py", line 57, in run
  File "/miniconda/envs/umccrise/lib/python3.7/site-packages/snakemake/executors/__init__.py", line 515, in cached_or_run
  File "/miniconda/envs/umccrise/lib/python3.7/site-packages/snakemake/executors/__init__.py", line 2201, in run_wrapper[0m
[31mExiting because a job execution failed. Look above for error message[0m
Trying to restart job 77.
pdiakumis commented 2 years ago

Getting the same error for SBJ02738 (see slack), will bump this to 60G. Perhaps bump that only upon failure.

pdiakumis commented 2 years ago

Solved via #115. Well, that sample ain't getting through the BPI stage even with 100G mem, but at least the dynamic mem bump based on attempts is working.

pdiakumis commented 2 years ago

This is handled better now. Upon BPI (OOM) error, the workflow will proceed anyway and just copy input to output. The missing BPI columns are handled in the cancer report by simply using the POS, ALT, and INFO/END fields as fallback. Worth noting that the AF_PURPLE column will also remain empty, since I think PURPLE uses the BPI AF to adjust its own, and if it's not there it just doesn't estimate it. The SV filtration is adjusted to not use the 10% BPI_AF step. There will likely be more false positives since that was one of the main benefits of BPI, but oh well, there shouldn't be too many OOM cases anyway.

[Tue Sep  6 15:30:39 2022]
rule sv_bpi_maybe:
    input: work/SBJ02738__PRJ222005/structural/sv_subsample_if_too_many/SBJ02738__PRJ222005-manta.vcf, /g/data/gx8/projects/diakumis/umccrise/data_test/SBJ02738_gds/somatic/PRJ222005_tumor.bam, /g/data/gx8/projec
ts/diakumis/umccrise/data_test/SBJ02738_gds/somatic/PRJ222005_tumor.bam.bai, /g/data/gx8/projects/diakumis/umccrise/data_test/SBJ02738_gds/germline/PRJ222013.bam, /g/data/gx8/projects/diakumis/umccrise/data_test/
SBJ02738_gds/germline/PRJ222013.bam.bai
    output: work/SBJ02738__PRJ222005/structural/maybe_bpi/SBJ02738__PRJ222005-manta.vcf
    log: log/structural/SBJ02738__PRJ222005/SBJ02738__PRJ222005-bpi_stats.txt
    jobid: 80
    wildcards: batch=SBJ02738__PRJ222005
    resources: mem_mb=30000

Exception in thread "main" java.lang.OutOfMemoryError: Java heap space
        at java.base/java.util.HashMap.newNode(HashMap.java:1797)
        at java.base/java.util.HashMap.putVal(HashMap.java:626)
        at java.base/java.util.HashMap.put(HashMap.java:607)
        at java.base/java.util.HashSet.add(HashSet.java:220)
        at com.hartwig.hmftools.breakpointinspector.clipping.Clipping.add(Clipping.java:58)
        at com.hartwig.hmftools.breakpointinspector.Analysis.lambda$null$16(Analysis.java:538)
        at com.hartwig.hmftools.breakpointinspector.Analysis$$Lambda$77/0x0000000800210040.accept(Unknown Source)
        at java.base/java.util.Spliterators$ArraySpliterator.forEachRemaining(Spliterators.java:948)
        at java.base/java.util.stream.ReferencePipeline$Head.forEach(ReferencePipeline.java:658)
        at com.hartwig.hmftools.breakpointinspector.Analysis.lambda$processStructuralVariant$17(Analysis.java:538)
        at com.hartwig.hmftools.breakpointinspector.Analysis$$Lambda$76/0x00000008001c9c40.accept(Unknown Source)
        at java.base/java.lang.Iterable.forEach(Iterable.java:75)
        at com.hartwig.hmftools.breakpointinspector.Analysis.processStructuralVariant(Analysis.java:538)
        at com.hartwig.hmftools.breakpointinspector.BreakPointInspectorApplication.main(BreakPointInspectorApplication.java:261)
[Tue Sep  6 21:40:04 2022]
Finished job 80.
ohofmann commented 2 years ago

Good workaround - I keep forgetting BPI is optional, and now wondering if the Circos plotting step needs the same treatment.

pdiakumis commented 2 years ago

Same OOM error with SBJ02884. Need to push this 2.1.2 version into prod asap I suppose. Or I can run in staging.

Sample Date Comment
SBJ02100 2022-05-09 -
SBJ02738 2022-08-22 Failed
SBJ02859 2022-10-12 QAP
SBJ02884 2022-10-25 Re-run Successful
SBJ02898 2022-10-31 Re-run Successful