uclahs-cds / pipeline-call-sSNV

A Nextflow pipeline to identify the somatic single nucleotide variants (sSNVs) by comparing a pair of tumor/normal samples.
https://uclahs-cds.github.io/pipeline-call-sSNV/
GNU General Public License v2.0
5 stars 0 forks source link

Add example doc with CPU/Mem resources for very large samples #288

Open Faizal-Eeman opened 6 months ago

Faizal-Eeman commented 6 months ago

When running very large sample BAMs through call-sSNV, it likely that the pipeline would fail because of default resource configurations.

Although the base_resouce_update function in template.config is a great utility to update resources on a case-by-case basis, it is often unclear on how much of the resource is to be updated for a successful run. It would be nice provide examples of resource configurations that worked for large BAMs, perhaps in a doc/ dir of the repo.

Here are the resources I set in my pipeline run's config,

base_resource_update {
        cpus = [
            ['call_sIndel_Manta', 0.1]
        ]
        memory = [
            [['run_validate_PipeVal', 'call_sSNV_Strelka2', 'call_sSNV_Mutect2', 'call_sIndel_Manta', 'concat_VCFs_BCFtools', 'plot_VennDiagram_R', 'run_LearnReadOrientationModel_GATK', 'convert_BAM2Pileup_SAMtools'], 10],
            [['call_sSNV_MuSE', 'run_sump_MuSE'], 2]
            ]
        }

Nextflow trace files

Case 1:

Normal - 2.5TB Tumor - 1.1TB - /hot/data/unregistered/Zook-Mootor-BNCH-GIAB/analysis/GIAB/AshkenazimParents/somatic-variants/call-sSNV-7.0.0/HG002-T/log-call-sSNV-7.0.0-20231201T000959Z/nextflow-log/trace.txt

Case 2:

Normal - 369GB

sorelfitzgibbon commented 6 months ago

@Faizal-Eeman, congrats on getting 2.5/1.1 TB samples through the pipeline! To help future users with large input files, I summarized the maximum resources actually used for each process:

Maximum values: Tool realtime %cpu* peak_rss
run_validate_PipeVal 16h 24m 64% 20.MB
call_sSNV_SomaticSniper 5d 5h 4m 85% 18.GB
convert_BAM2Pileup_SAMtools 9d 1h 46m 90% 25.GB
call_sIndel_Manta 4d 4h 24m 239% 5.GB
call_sSNV_Strelka2 23h 33m 2530% 20.GB
call_sSNV_Mutect2 1d 7h 32m 125% 23.GB
run_LearnReadOrientationModel_GATK 22m 103% 28.GB
call_sSNV_MuSE 1d 9h 27m 1589% 119.GB
run_sump_MuSE 3m 172% 3.GB

*don't trust the %cpu numbers

sorelfitzgibbon commented 6 months ago
Maximum values for Case 2 tumors 1 and 2: Tumor Tool realtime %cpu* peak_rss
tumor1 call_sSNV_SomaticSniper 13h 30m 93% 3.2 GB
tumor2 call_sSNV_SomaticSniper 12h 47m 96% 2.6 GB
tumor1 convert_BAM2Pileup_SAMtools 12h 4m 95% 13.7 GB
tumor2 convert_BAM2Pileup_SAMtools 11h 23m 96% 13.5 GB
tumor1 call_sIndel_Manta 1d 12h 52m 79% < 1 GB
tumor2 call_sIndel_Manta 1d 11h 34m 79% < 1 GB
tumor1 call_sSNV_Strelka2 7h 3m 703% 3.3 GB
tumor2 call_sSNV_Strelka2 6h 27m 750% 2.8 GB
tumor1 call_sSNV_Mutect2 6h 40m 100% 2.8 GB
tumor2 call_sSNV_Mutect2 6h 13m 100% 2.5 GB
tumor1 run_LearnReadOrientationModel_GATK 6m 98% 2.7 GB
tumor2 run_LearnReadOrientationModel_GATK 8m 98% 3.2 GB
tumor1 call_sSNV_MuSE 5h 22m 1196% 62.5 GB
tumor2 call_sSNV_MuSE 6h 10m 1198% 67.5 GB
tumor1 run_sump_MuSE < 1m 1427% 9.1 GB
tumor2 run_sump_MuSE 6m 1126% 32.4 GB

*don't trust the %cpu numbers

Faizal-Eeman commented 6 months ago

Here are the failed logs for the 2TB sample that lead me to the CPU/memory update in the description. The memory allocation was default for these failed logs and as I identified error code 137 I updated that process's allocation accordingly. I also updated allocation for processes where I anticipated a memory error 137.

/hot/data/unregistered/Zook-Mootor-BNCH-GIAB/analysis/GIAB/AshkenazimParents/somatic-variants/test/failed/call-sSNV-7.0.0/HG002-T/log-call-sSNV-7.0.0-20231102T215524Z
/hot/data/unregistered/Zook-Mootor-BNCH-GIAB/analysis/GIAB/AshkenazimParents/somatic-variants/test/failed/call-sSNV-7.0.0/HG002-T/log-call-sSNV-7.0.0-20231112T041924Z
/hot/data/unregistered/Zook-Mootor-BNCH-GIAB/analysis/GIAB/AshkenazimParents/somatic-variants/test/failed/call-sSNV-7.0.0/HG002-T/log-call-sSNV-7.0.0-20231114T003236Z
/hot/data/unregistered/Zook-Mootor-BNCH-GIAB/analysis/GIAB/AshkenazimParents/somatic-variants/test/failed/call-sSNV-7.0.0/HG002-T/log-call-sSNV-7.0.0-20231115T032828Z
tyamaguchi-ucla commented 5 months ago

@sorelfitzgibbon @Faizal-Eeman do you guys think it's worth updating the M64 config based on these results? https://github.com/uclahs-cds/pipeline-call-sSNV/blob/main/config/M64.config

sorelfitzgibbon commented 5 months ago

@sorelfitzgibbon @Faizal-Eeman do you guys think it's worth updating the M64 config based on these results? https://github.com/uclahs-cds/pipeline-call-sSNV/blob/main/config/M64.config

Yes, it looks like several values can be substantially lowered. I'll work on this.

sorelfitzgibbon commented 5 months ago

@Faizal-Eeman it looks like these files have moved, are they still easily accessible? I'd like to check a couple little things, but not urgent.

Faizal-Eeman commented 5 months ago

@Faizal-Eeman it looks like these files have moved, are they still easily accessible? I'd like to check a couple little things, but not urgent.

Yes. I've updated the file paths here now.

tyamaguchi-ucla commented 5 months ago

It looks like our configs miss a few processes in SomaticSniper (e.g. generate_ReadCount_bam_readcount) @sorelfitzgibbon