icbi-lab / nextNEOpi

nextNEOpi: a comprehensive pipeline for computational neoantigen prediction
Other
67 stars 24 forks source link

nextNEOpi fails #18

Closed aforsythe closed 1 year ago

aforsythe commented 2 years ago

I'm trying to track down the source of an error in the nextNEOpi pipeline. It's not immediately obvious what the issue is (memory, a singularity issue, something else?)

Any help would be greatly appreciated.

I'm encountering an error at :

Error executing process > 'make_uBAM (P016 : tumor_DNA)'

Caused by:
  Process `make_uBAM (P016 : tumor_DNA)` terminated with an error exit status (247)

Command executed:

  mkdir -p /tmp/16525322
  gatk --java-options "-Xmx256G -XX:ParallelGCThreads=1" FastqToSam \
      --TMP_DIR /tmp/16525322 \
      --MAX_RECORDS_IN_RAM 4194304 \
      -F1 nR217-L4-G2-P098-TATTGAAT-TGCGTCGG-READ1-Sequences.txt.gz -F2 nR217-L4-G2-P098-TATTGAAT-TGCGTCGG-READ2-Sequences.txt.gz \
      --READ_GROUP_NAME P016_tumor \
      --SAMPLE_NAME P016_tumor \
      --LIBRARY_NAME P016_tumor \
      --PLATFORM ILLUMINA \
      -O P016_tumor_DNA_unaligned.bam

Command exit status:
  247

Command output:
  (empty)

Command error:
  INFO  2022-10-22 00:47:35 FastqToSam  Processed     8,000,000 records.  Elapsed time: 00:01:50s.  Time for last 1,000,000:   12s.  Last read position: */*
  INFO  2022-10-22 00:48:08 FastqToSam  Processed     9,000,000 records.  Elapsed time: 00:02:23s.  Time for last 1,000,000:   32s.  Last read position: */*
  INFO  2022-10-22 00:48:21 FastqToSam  Processed    10,000,000 records.  Elapsed time: 00:02:36s.  Time for last 1,000,000:   12s.  Last read position: */*
  INFO  2022-10-22 00:48:32 FastqToSam  Processed    11,000,000 records.  Elapsed time: 00:02:47s.  Time for last 1,000,000:   11s.  Last read position: */*
  INFO  2022-10-22 00:48:44 FastqToSam  Processed    12,000,000 records.  Elapsed time: 00:02:59s.  Time for last 1,000,000:   11s.  Last read position: */*
  INFO  2022-10-22 00:49:08 FastqToSam  Processed    13,000,000 records.  Elapsed time: 00:03:22s.  Time for last 1,000,000:   23s.  Last read position: */*
  INFO  2022-10-22 00:49:18 FastqToSam  Processed    14,000,000 records.  Elapsed time: 00:03:32s.  Time for last 1,000,000:   10s.  Last read position: */*
  INFO  2022-10-22 00:49:31 FastqToSam  Processed    15,000,000 records.  Elapsed time: 00:03:46s.  Time for last 1,000,000:   13s.  Last read position: */*
  INFO  2022-10-22 00:49:43 FastqToSam  Processed    16,000,000 records.  Elapsed time: 00:03:58s.  Time for last 1,000,000:   11s.  Last read position: */*
  INFO  2022-10-22 00:50:05 FastqToSam  Processed    17,000,000 records.  Elapsed time: 00:04:20s.  Time for last 1,000,000:   22s.  Last read position: */*
  INFO  2022-10-22 00:50:18 FastqToSam  Processed    18,000,000 records.  Elapsed time: 00:04:33s.  Time for last 1,000,000:   12s.  Last read position: */*
  INFO  2022-10-22 00:50:31 FastqToSam  Processed    19,000,000 records.  Elapsed time: 00:04:46s.  Time for last 1,000,000:   12s.  Last read position: */*
  INFO  2022-10-22 00:51:04 FastqToSam  Processed    20,000,000 records.  Elapsed time: 00:05:19s.  Time for last 1,000,000:   33s.  Last read position: */*
  INFO  2022-10-22 00:51:27 FastqToSam  Processed    21,000,000 records.  Elapsed time: 00:05:42s.  Time for last 1,000,000:   22s.  Last read position: */*
  INFO  2022-10-22 00:51:39 FastqToSam  Processed    22,000,000 records.  Elapsed time: 00:05:53s.  Time for last 1,000,000:   11s.  Last read position: */*
  INFO  2022-10-22 00:51:54 FastqToSam  Processed    23,000,000 records.  Elapsed time: 00:06:09s.  Time for last 1,000,000:   15s.  Last read position: */*
  INFO  2022-10-22 00:52:06 FastqToSam  Processed    24,000,000 records.  Elapsed time: 00:06:21s.  Time for last 1,000,000:   11s.  Last read position: */*
  INFO  2022-10-22 00:52:18 FastqToSam  Processed    25,000,000 records.  Elapsed time: 00:06:33s.  Time for last 1,000,000:   11s.  Last readion: */*
  INFO  2022-10-22 00:50:18 FastqToSam  Processed    18,000,000 records.  Elapsed time: 00:04:33s.  Time for last 1,000,000:   12s.  Last read position: */*
  INFO  2022-10-22 00:50:31 FastqToSam  Processed    19,000,000 records.  Elapsed time: 00:04:46s.  Time for last 1,000,000:   12s.  Last read position: */*
  INFO  2022-10-22 00:51:04 FastqToSam  Processed    20,000,000 records.  Elapsed time: 00:05:19s.  Time for last 1,000,000:   33s.  Last read position: */*
  INFO  2022-10-22 00:51:27 FastqToSam  Processed    21,000,000 records.  Elapsed time: 00:05:42s.  Time for last 1,000,000:   22s.  Last read position: */*
  INFO  2022-10-22 00:51:39 FastqToSam  Processed    22,000,000 records.  Elapsed time: 00:05:53s.  Time for last 1,000,000:   11s.  Last read position: */*
  INFO  2022-10-22 00:51:54 FastqToSam  Processed    23,000,000 records.  Elapsed time: 00:06:09s.  Time for last 1,000,000:   15s.  Last read position: */*
  INFO  2022-10-22 00:52:06 FastqToSam  Processed    24,000,000 records.  Elapsed time: 00:06:21s.  Time for last 1,000,000:   11s.  Last read position: */*
  INFO  2022-10-22 00:52:18 FastqToSam  Processed    25,000,000 records.  Elapsed time: 00:06:33s.  Time for last 1,000,000:   11s.  Last read position: */*
  INFO  2022-10-22 00:52:43 FastqToSam  Processed    26,000,000 records.  Elapsed time: 00:06:58s.  Time for last 1,000,000:   24s.  Last read position: */*
  INFO  2022-10-22 00:52:55 FastqToSam  Processed    27,000,000 records.  Elapsed time: 00:07:10s.  Time for last 1,000,000:   12s.  Last read position: */*
  INFO  2022-10-22 00:53:11 FastqToSam  Processed    28,000,000 records.  Elapsed time: 00:07:26s.  Time for last 1,000,000:   15s.  Last read position: */*
  INFO  2022-10-22 00:53:24 FastqToSam  Processed    29,000,000 records.  Elapsed time: 00:07:38s.  Time for last 1,000,000:   12s.  Last read position: */*
  INFO  2022-10-22 00:53:57 FastqToSam  Processed    30,000,000 records.  Elapsed time: 00:08:12s.  Time for last 1,000,000:   33s.  Last read position: */*
  INFO  2022-10-22 00:54:16 FastqToSam  Processed    31,000,000 records.  Elapsed time: 00:08:31s.  Time for last 1,000,000:   18s.  Last read position: */*
  INFO  2022-10-22 00:54:31 FastqToSam  Processed    32,000,000 records.  Elapsed time: 00:08:46s.  Time for last 1,000,000:   15s.  Last read position: */*
  INFO  2022-10-22 00:54:47 FastqToSam  Processed    33,000,000 records.  Elapsed time: 00:09:02s.  Time for last 1,000,000:   16s.  Last read position: */*
  INFO  2022-10-22 00:55:15 FastqToSam  Processed    34,000,000 records.  Elapsed time: 00:09:29s.  Time for last 1,000,000:   27s.  Last read position: */*
  INFO  2022-10-22 00:55:26 FastqToSam  Processed    35,000,000 records.  Elapsed time: 00:09:40s.  Time for last 1,000,000:   11s.  Last read position: */*
  INFO  2022-10-22 00:55:37 FastqToSam  Processed    36,000,000 records.  Elapsed time: 00:09:51s.  Time for last 1,000,000:   11s.  Last read position: */*
  INFO  2022-10-22 00:55:48 FastqToSam  Processed    37,000,000 records.  Elapsed time: 00:10:03s.  Time for last 1,000,000:   11s.  Last read position: */*
  INFO  2022-10-22 00:56:13 FastqToSam  Processed    38,000,000 records.  Elapsed time: 00:10:28s.  Time for last 1,000,000:   25s.  Last read position: */*
  INFO  2022-10-22 00:56:24 FastqToSam  Processed    39,000,000 records.  Elapsed time: 00:10:39s.  Time for last 1,000,000:   10s.  Last read position: */*
  INFO  2022-10-22 00:56:35 FastqToSam  Processed    40,000,000 records.  Elapsed time: 00:10:50s.  Time for last 1,000,000:   10s.  Last read position: */*
  INFO  2022-10-22 00:56:46 FastqToSam  Processed    41,000,000 records.  Elapsed time: 00:11:00s.  Time for last 1,000,000:   10s.  Last read position: */*
  INFO  2022-10-22 00:57:06 FastqToSam  Processed    42,000,000 records.  Elapsed time: 00:11:21s.  Time for last 1,000,000:   20s.  Last read position: */*
  INFO  2022-10-22 00:57:20 FastqToSam  Processed    43,000,000 records.  Elapsed time: 00:11:35s.  Time for last 1,000,000:   13s.  Last read position: */*
  INFO  2022-10-22 00:57:25 FastqToSam  Processed    44,000,000 records.  Elapsed time: 00:11:39s.  Time for last 1,000,000:    4s.  Last read position: */*
  INFO  2022-10-22 00:57:29 FastqToSam  Processed    45,000,000 records.  Elapsed time: 00:11:43s.  Time for last 1,000,000:    4s.  Last read position: */*
  INFO  2022-10-22 00:57:33 FastqToSam  Processed    46,000,000 records.  Elapsed time: 00:11:48s.  Time for last 1,000,000:    4s.  Last read position: */*
  Using GATK jar /opt/conda/share/gatk4-4.2.6.1-1/gatk-package-4.2.6.1-local.jar
  Running:
      java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=2 -Xmx256G -XX:ParallelGCThreads=1 -jar /opt/conda/share/gatk4-4.2.6.1-1/gatk-package-4.2.6.1-local.jar FastqToSam --TMP_DIR /tmp/16525322 --MAX_RECORDS_IN_RAM 4194304 -F1 nR217-L4-G2-P098-TATTGAAT-TGCGTCGG-READ1-Sequences.txt.gz -F2 nR217-L4-G2-P098-TATTGAAT-TGCGTCGG-READ2-Sequences.txt.gz --READ_GROUP_NAME P016_tumor --SAMPLE_NAME P016_tumor --LIBRARY_NAME P016_tumor --PLATFORM ILLUMINA -O P016_tumor_DNA_unaligned.bam

Work dir:
  /dfs5/bio/shachey/data/wes_2022/analysis/work/a9/828d4392552c7e4700f7e9ce5163b4

Tip: you can replicate the issue by changing to the process work dir and entering the command `bash .command.run`

My slurm error log says :

slurmstepd: error: *** JOB 16525322 ON hpc3-20-22 CANCELLED AT 2022-10-21T18:55:47 ***
slurmstepd: error: Detected 2 oom-kill event(s) in StepId=16525322.batch. Some of your processes may have been killed by the cgroup out-of-memory handler.

Looking into .command.err in the work directory I see:

FATAL:   could not open image /tmp/16525322/apps-01.i-med.ac.at-images-singularity-nextNEOpi_1.3.2_18734d43.sif: failed to retrieve path for /tmp/16525322/apps-01.i-med.ac.at-images-singularity-nextNEOpi_1.3.2_18734d43.sif: lstat /tmp/16525322: no such file or directory

My configurations and inputs are

-------------------------------------------------------------------------
C O N F I G U R A T I O N

Command Line:        nextflow run /data/homezvol0/shachey/bin/nextNEOpi/nextNEOpi.nf --batchFile /dfs5/bio/shachey/data/wes_2022/analysis/lungCancer_batchFile_FastQ.csv -profile singularity,cluster -config /data/homezvol0/shachey/bin/nextNEOpi/conf/params.config --tmpDir /tmp/16525322 --outputDir /dfs5/bio/shachey/data/wes_2022/analysis/NextNEOpi_Results --accept-license
Working Directory:   /data/homezvol0/shachey/dfs5/data/wes_2022/analysis
Output Directory:    /dfs5/bio/shachey/data/wes_2022/analysis/NextNEOpi_Results

I N P U T

batch file:          /dfs5/bio/shachey/data/wes_2022/analysis/lungCancer_batchFile_FastQ.csv

Please check --help for further instruction
-------------------------------------------------------------------------
IEDB_dir: /dfs6/pub/share/nextNEOpi/resources/databases/iedb
MHCFLURRY_dir: /dfs6/pub/share/nextNEOpi/resources/databases/mhcflurry_data
tmpDir: /tmp/16525322
Pipeline Name                 : icbi/nextNEOpi
Pipeline Version              : 1.0
Batch file                    : /dfs5/bio/shachey/data/wes_2022/analysis/lungCancer_batchFile_FastQ.csv
Read length                   : 150
Exome capture kit             : sureSelectV6
Fasta Ref                     : /dfs6/pub/share/nextNEOpi/resources/references/hg38/gdc/GRCh38.d1.vd1/fasta/GRCh38.d1.vd1.fa
MillsGold                     : /dfs6/pub/share/nextNEOpi/resources/databases/GATKresourceBundle/Mills_and_1000G_gold_standard.indels.hg38.vcf
hcSNPS1000G                   : /dfs6/pub/share/nextNEOpi/resources/databases/GATKresourceBundle/1000G_phase1.snps.high_confidence.hg38.vcf.gz
HapMap                        : /dfs6/pub/share/nextNEOpi/resources/databases/GATKresourceBundle/hapmap_3.3.hg38.vcf.gz
Cosmic                        : /dfs6/pub/share/nextNEOpi/resources/databases/cosmic/hg38/v84/CosmicCodingMuts.hg38.v84.vcf
DBSNP                         : /dfs6/pub/share/nextNEOpi/resources/databases/GATKresourceBundle/Homo_sapiens_assembly38.dbsnp138.vcf
GnomAD                        : /dfs6/pub/share/nextNEOpi/resources/databases/GATKresourceBundle/Mutect2/GetPileupSummaries/small_exac_common_3.hg38.vcf
GnomADfull                    : /dfs6/pub/share/nextNEOpi/resources/databases/GATKresourceBundle/Mutect2/gnomAD/af-only-gnomad.hg38.vcf.gz
KnownIndels                   : /dfs6/pub/share/nextNEOpi/resources/databases/GATKresourceBundle/Homo_sapiens_assembly38.known_indels.vcf
BlastDB                       : /dfs6/pub/share/nextNEOpi/resources/references/blast/
priority variant Caller       : M2
Mutect 1 and 2 minAD          : 5
VarScan min_cov               : 10
VarScan min_cov_tumor         : 10
VarScan min_cov_normal        : 10
VarScan min_freq_for_hom      : 0.75
VarScan somatic_pvalue        : 0.99
VarScan somatic_somaticpvalue : 0.05
VarScan strand_filter         : 1
VarScan processSomatic_pvalue : 0.05
VarScan max_normal_freq       : 0.05
VarScan min_tumor_freq        : 0.1
VarScan min_map_q             : 10
VarScan min_base_q            : 20
VEP assembly                  : GRCh38
VEP species                   : homo_sapiens
VEP options                   : --everything
Number of scatters            : 40
Output dir                    : /dfs5/bio/shachey/data/wes_2022/analysis/NextNEOpi_Results
Working dir                   : /dfs5/bio/shachey/data/wes_2022/analysis/work
TMP dir                       : /tmp/16525322
Current home                  : /data/homezvol0/shachey
Current user                  : shachey
Current path                  : /data/homezvol0/shachey/dfs5/data/wes_2022/analysis
JAVA_Xmx                      : -Xmx256G
Picard maxRecordsInRam        : 4194304
Script dir                    : /data/homezvol0/shachey/bin/nextNEOpi
Config Profile                : singularity,cluster
-------------------------------------------------------------------------
WARNING: P016 sex not specified will infer from data
WARNING: P016 sex not specified will infer from data
Found /data/homezvol0/shachey/bin/hlahd.1.5.0/bin/hlahd.sh at: /data/homezvol0/shachey/bin/hlahd.1.5.0/bin/hlahd.sh
Found /data/homezvol0/shachey/bin/mixcr/mi.license at: /data/homezvol0/shachey/bin/mixcr/mi.license
riederd commented 2 years ago

Hi,

thanks for your interest in nextNEOpi.

I'm sorry that you ran into troubles, but according to your error log it seems that your machine is running out of memory:

slurmstepd: error: *** JOB 16525322 ON hpc3-20-22 CANCELLED AT 2022-10-21T18:55:47 *** slurmstepd: error: Detected 2 oom-kill event(s) in StepId=16525322.batch. Some of your processes may have been killed by the cgroup out-of-memory handler.

You might need or tell SLURM to reserve more memory. Or you can also try to request more memory by setting a global 'memory' option in nextNEOpi conf/process.config (right after the cpus line).

https://github.com/icbi-lab/nextNEOpi/blob/aac260dbd5da701d22a09846045f3f42981cd4a1/conf/process.config#L9

  memory = '64 GB'

or you can do this as per process option in conf/process.config:

E.g. https://github.com/icbi-lab/nextNEOpi/blob/aac260dbd5da701d22a09846045f3f42981cd4a1/conf/process.config#L28

withName:make_uBAM {
        beforeScript = "ulimit -n 4096"
    memory = "64 GB"
    }