icbi-lab / nextNEOpi

nextNEOpi: a comprehensive pipeline for computational neoantigen prediction
Other
65 stars 23 forks source link

Error executing process #13

Closed KunFang93 closed 1 year ago

KunFang93 commented 1 year ago

Hi,

I am a fresh user of nextNEOpi and exciting to explore this well-built pipeline! But I met some errors when run my test dataset.

The Error:

Error executing process > 'fastp (Patient353_T1star : tumor_RNA)'

Caused by:
  No signature of method: java.lang.Boolean.getVal() is applicable for argument types: () values: []
Possible solutions: getAt(java.lang.String), getClass(), equals(java.lang.Object), equals(java.lang.Object), tap(groovy.lang.Closure), putAt(java.lang.String, java.lang.Object)

Source block:
  def reads_R1         = "--in1 " + reads[0]
  def trimmed_reads_R1 = "--out1 " + meta.sampleName + "_" + meta.sampleType + "_trimmed_R1.fastq.gz"
  def reads_R2         = ""
  def trimmed_reads_R2 = ""
  if(meta.libType == "PE") {
              reads_R2          = "--in2 " + reads[1]
              trimmed_reads_R2  = "--out2 " + meta.sampleName + "_" + meta.sampleType + "_trimmed_R2.fastq.gz"
          }
  def fastpAdapter = ''
  def adapterSeqFile
  def aseq = false
  def aseqR2 = false
  def afile = false
  if (meta.sampleType.indexOf("DNA") > 0) {
              afile = params.adapterSeqFile
              aseq = params.adapterSeq
              aseqR2 = params.adapterSeqR2
          } else {
               afile = params.adapterSeqFileRNAseq
               aseq = params.adapterSeqRNAseq
               aseqR2 = params.adapterSeqR2RNAseq
          }
  if(afile != false) {
              adapterSeqFile = Channel.fromPath(afile)
              fastpAdapter = "--adapter_fasta " + adapterSeqFile
          } else {
              if(aseq != false) {
                  adapterSeq   = Channel.value(aseq)
                  fastpAdapter = "--adapter_sequence " + aseq.getVal()

                  if(aseqR2 != false && meta.libType == "PE") {
                      adapterSeqR2   = Channel.value(aseqR2)
                      fastpAdapter += " --adapter_sequence_r2 " + adapterSeqR2.getVal()
                  }
              }
          }
  """
          fastp --thread ${task.cpus} \\
              ${reads_R1} \\
              ${reads_R2} \\
              ${trimmed_reads_R1} \\
              ${trimmed_reads_R2} \\
              --json ${meta.sampleName}_${meta.sampleType}_fastp.json \\
              ${fastpAdapter} \\
              ${params.fastpOpts}
          """

Tip: you can replicate the issue by changing to the process work dir and entering the command `bash .command.run`

The command I used:

nextflow run /scratch/u/kfang/Software/nextNEOpi/nextNEOpi.nf --batchFile test2.csv -config conf/params.config --trim_adapters true --trim_adapters_RNAseq true -profile conda -resume --accept_license --MIXCR_lic /scratch/u/kfang/Software/nextNEOpi/resources/mi.license

The files in current folder:

(nextflow-22.04.0)[kfang@ln01 test]$ ls
conf
OSU13053-0000353-N-0192630-HYB729-IDTExome-20180816-NCHHiSeq_S1_L001002_R1_P001.fastq.gz
OSU13053-0000353-N-0192630-HYB729-IDTExome-20180816-NCHHiSeq_S1_L001002_R2_P001.fastq.gz
OSU13053-0000353-T-0194071-HYB729-IDTExome-20180816-NCHHiSeq_S6_L001002_R1_P001.fastq.gz
OSU13053-0000353-T-0194071-HYB729-IDTExome-20180816-NCHHiSeq_S6_L001002_R2_P001.fastq.gz
OSU13053-0000353-T-0204980-HYB00853-Txome-20190612-Illumina_S1_L004_R1_001.fastq.gz
OSU13053-0000353-T-0204980-HYB00853-Txome-20190612-Illumina_S1_L004_R2_001.fastq.gz
test2.csv

(nextflow-22.04.0)[kfang@ln01 test]$ ls conf/
params.config  process.config  profiles.config  resources.config

The context in conf/params.config:

params {

  help  = false
  email = false
  name  = false
  plaintext_email = false

  //
  // resourcesBaseDir: Default is "resources" in the pipelines directory
  //
  // Please change to a directory that has space for at least 60 GB of data
  // Download the resources file from https://apps-01.i-med.ac.at/resources/nextneopi/nextNEOpi_resources.tar.gz
  // and uncompress the archive into "resourcesBaseDir"
  // default: <nextNEOpi_dir>/resources
  resourcesBaseDir = projectDir.toRealPath() + "/resources"

  // RNA sequencing library type
  RNA_tag_seq   = false     // is RNA data from tag sequencing i.e. 3-prime seq
                            // if true then pVACseq tRNA_vaf filter is set to 0.0

  WES             = true  // if false assume WGS, attention long runtimes
  exomeCaptureKit = "sureSelectV6" // default exomeCaptureKit sure select V6, see resources.config to add more
  readLength      = 150

  trim_adapters        = true
  adapterSeq           = true // "AGATCGGAAGAG" Illumina Universal Adapter
  adapterSeqR2         = true
  adapterSeqFile       = false // fasta file with adapter seqs

  trim_adapters_RNAseq = true
  adapterSeqRNAseq     = true // "AGATCGGAAGAG"  Illumina Universal Adapter
  adapterSeqR2RNAseq   = true
  adapterSeqFileRNAseq = false // fasta file with adapter seqs

  // extra options for fastp
  fastpOpts    = ""

  // HLA typing options
  disable_OptiType  = false // Disable OptiType for HLA typing. If set to true, HLA-HD or a user
                            // supplied custom HLA file must be available.
                            // (see --HLAHD_DIR and/or --customHLA)
  run_HLAHD_RNA = false     // run HLA-HD on RNA data.
                            // It is highly accurate but tends to be very slow on larger fastq files

  HLA_force_RNA = false  // use only RNAseq for HLA typing
  HLA_force_DNA = false  // use only WES/WGS for HLA typing

  // run controlFREEC
  controlFREEC = false
// Panel of normals (see: https://gatk.broadinstitute.org/hc/en-us/articles/360040510131-CreateSomaticPanelOfNormals-BETA-)
  mutect2ponFile = 'NO_FILE'

  primaryCaller = "M2" // set the variant caller used as base for the hc variants.
                        // Only variants that are confirmed by any of the two confirming
                        // callers (e..g. mutect1, varscan) will be retained
                        // any of: M2 = mutect2, M1 = mutect1, VS = varscan, ST = strelka

  // CCF estimation
  use_sequenza_cnvs = false // use CNVs and purity from Sequenza for CCF estimation
                            // default: ASCAT with fall back to Sequenza
  CCFthreshold = 0.95       // threshold clonality
  pClonal = 0.9             // min probability for clonality

  // Directories (need to be in quotes)
  tmpDir          = "/scratch/u/kfang/ChenHZ_lab/Neoantigen/test/tmp"  // Please make sure that there is enough free space (~ 50G)
  workDir         = "$PWD"
  outputDir       = "${workDir}/RESULTS"

  // Result publishing method
  publishDirMode  = "auto" // Choose between:
                           // "auto" - if possible use link otherwise copy
                           // "symlink" - absolute path
                           // "rellink" -relative path
                           // "link " - hard link
                           // "copy"
                           // "copyNoFollow" - copying without following symlink

  fullOutput      = false  // enable full output in outputDir, adds some intermediate results, for debug mainly: default false

  tracedir         = "${params.outputDir}/pipeline_info"
  manifest.version = '1.0'

  // Software locations

  // ONLY if not using conda or singularity, please specify the path to the VarScan2 jar
  VARSCAN         = "" // /path to VarScan jar  // Version: 2.4.3

  // optional: specify path to mutect1 jar and the JAVA7 executable.
  // https://software.broadinstitute.org/cancer/cga/mutect_download
  // https://download.java.net/openjdk/jdk7u75/ri/jdk_ri-7u75-b13-linux-x64-18_dec_2014.tar.gz
  MUTECT1         = "" //  path to mutect-1.1.7.jar  // Version: 1.1.7
  JAVA7           = "" // path to jdk7 bin java // Version 1.7

  // optional but highly recommended GATK3
  // ONLY if not using conda or singularity, please specify the path to the GATK3 jar
  // https://console.cloud.google.com/storage/browser/_details/gatk-software/package-archive/gatk/GenomeAnalysisTK-3.8-1-0-gf15c1c3ef.tar.bz2
  GATK3           = "" // path to GATK3 GenomeAnalysisTK.jar // Version 3.8-0

  // ONLY if not using conda or singularity, please specify the path to the JAVA8 executable
  // https://download.java.net/openjdk/jdk8u41/ri/openjdk-8u41-b04-linux-x64-14_jan_2020.tar.gz
  JAVA8           = "" // path to jdk8 bin java // Version 1.8

  // REQUIRED: Path to the installation directory of HLA-HD
  // Please install HLA-HD locally, you can get your own copy of HLA-HD at:
  // https://www.genome.med.kyoto-u.ac.jp/HLA-HD/
  HLAHD_DIR             = "/scratch/u/kfang/Software/hlahd.1.5.0/bin" //  path to HLA_HD hlahd.1.5.0
  HLA_HD_genome_version = "hg38"

  // URL to the installation package of MIXCRC, will be installed automatically.
  MIXCR_url       = "https://github.com/milaboratory/mixcr/releases/download/v4.0.0/mixcr-4.0.0.zip"
  MIXCR_lic       = "" // path to MiXCR license file
  MIXCR           = "" // Optional: specify path to mixcr directory if already installed, will be installed automatically otherwise
  // analyze TCRs using mixcr
  TCR = true

  // MixMHC2pred
  MiXMHC2PRED_url = "https://github.com/GfellerLab/MixMHC2pred/releases/download/v1.2/MixMHC2pred-1.2.zip"
  MiXMHC2PRED     = "" // Optional: specify path to MixMHC2pred_unix directory if already installed, will be installed automatically otherwise

  // Immunogenicity score: https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6774822/
  // https://github.com/vincentlaboratories/neoag
  IGS_script_url = "https://github.com/vincentlaboratories/neoag/raw/master/NeoAg_immunogenicity_predicition_GBM.R"
  IGS_model_url = "https://github.com/vincentlaboratories/neoag/raw/master/Final_gbm_model.rds"
  IGS = "" // optional path to IGS

  // IEDB tools urls for MHCI and MHCII. These will be used for IEDB installation into resources.databases.IEDB_dir
  IEDB_MHCI_url  = "https://downloads.iedb.org/tools/mhci/3.1.2/IEDB_MHC_I-3.1.2.tar.gz"
  IEDB_MHCII_url = "https://downloads.iedb.org/tools/mhcii/3.1.6/IEDB_MHC_II-3.1.6.tar.gz"

  // Java settings: please adjust to your memory available
  JAVA_Xmx = "-Xmx64G"

  // samtools memory: please adjust to your memory available
  STperThreadMem = "8G"

  // sambamba settings: please adjust to your memory available
  SB_hash_table_size = "1048576"
  SB_overflow_list_size = "1000000"
  SB_io_buffer_size = "1024"
  SB_sort_mem = "64G"

  // Filter variants (FilterMutect2Tumor): set minimum allele depth
  minAD = 5

  // Picard common options
  maxRecordsInRam = "4194304"
  maxRecordsInRamMerge = "10485760"

  // CNNScoreVariants batch size
  // in case CNNScoreVariants runs out of memory try to set both to 64
  transferBatchSize = 256
  inferenceBatchSize = 128

  // VarScan
  // Process VarscanSomatic
  min_cov                 = "10"
  min_cov_tumor           = "10"
  min_cov_normal          = "10"
  min_freq_for_hom        = "0.75"
  somatic_pvalue          = "0.99"
  somatic_somaticpvalue   = "0.05"
  strand_filter           = "1"
  // Process: ProcessSomatic
  processSomatic_pvalue   = "0.05"
  max_normal_freq         = "0.05"
  min_tumor_freq          = "0.1"

  // BAMREADCOUNT
  min_map_q               = "10"
  min_base_q              = "20"

  // VEP
  vep_version             = "106.1"
  vep_assembly            = "GRCh38"
  vep_cache_version       = "106"
  vep_species             = "homo_sapiens"
  vep_options             = "--everything" // "--af --af_1kg --af_gnomad --appris --biotype --check_existing --distance 5000 --failed 1 --merged --numbers --polyphen b --protein --pubmed --regulatory --sift b --symbol --xref_refseq --tsl --gene_phenotype"

  // Scatter Count (for parallel processing over geneomi regions)
  scatter_count           = "40"

  // NeoFuse settings
  pepMin_length  = "8"
  pepMax_length  = "11"
  out_ID         = ""
  IC50_Threshold = "500"
  rank           = "2"
  conf_lvl       = "L"
  netMHCpan      = "false"

  // Process: NeoFuse_build
  build           = "false"
  buildRef        = "false"
  version         = "GRCh38"

  // pVACseq settings
  mhci_epitope_len         = "8,9,10,11"
  mhcii_epitope_len        = "15,16,17,18,19,20,21,22,23,24,25" // minimum length has to be at least 15 (see pVACtools /opt/iedb/mhc_ii/mhc_II_binding.py line 246)
  epitope_prediction_tools = "NetMHCpan MHCflurry NetMHCIIpan"
  use_NetChop              = false
  use_NetMHCstab           = true

  pVACseq_filter_set = "standard"
  pVACseq_custom_filters = ""

  pVACseq_filter_sets {
    standard = "--binding-threshold 500 --top-score-metric median --minimum-fold-change 0.0 --normal-cov 5 --tdna-cov 10 --trna-cov 10 --normal-vaf 0.02 --tdna-vaf 0.25 --trna-vaf 0.25 --expn-val 1 --maximum-transcript-support-level 1"
    relaxed = "--binding-threshold 500 --percentile-threshold 2 --top-score-metric lowest --expn-val 2 --maximum-transcript-support-level 5 --normal-vaf 0.01 --trna-vaf 0.02 --tdna-vaf 0.02"
    custom = "${params.pVACseq_custom_filters}"
  }

  // CSiN
  csin_rank      = "0.375 0.5 0.625 0.75 1.25 1.75 2"
  csin_ic50      = "500"
  csin_gene_exp  = "1"

}

// include config
includeConfig './process.config'
includeConfig './resources.config'
includeConfig './profiles.config'

timeline {
  enabled = true
  file = "${params.tracedir}/icbi/nextNEOpi_timeline.html"
}
report {
  enabled = true
  file = "${params.tracedir}/icbi/nextNEOpi_report.html"
}
trace {
  enabled = true
  file = "${params.tracedir}/icbi/nextNEOpi_trace.txt"
}
dag {
  enabled = true
  file = "${params.tracedir}/icbi/nextNEOpi_dag.svg"
}

manifest {
  name = 'icbi/nextNEOpi'
  author = 'Dietmar Rieder, Georgios Fotakis, Francesca Finotello'
  homePage = 'https://github.com/icbi-lab/nextNEOpi'
  description = 'Nextflow pipeline for neoantigen prediction'
  mainScript = 'nextNEOpi.nf'
  nextflowVersion = '>=20.10.0'
  version = '1.0'
}

//
// import plain java classes Paths
// and get realpaths for bind mounts
//
import java.nio.file.Paths;

new File(params.tmpDir).mkdirs()
params.singularityTmpMount = params.tmpDir.startsWith("/tmp/") ? "/tmp" : Paths.get(params.tmpDir).toRealPath()
params.singularityHLAHDmount = (params.HLAHD_DIR != "") ? " -B " + Paths.get(params.HLAHD_DIR).toRealPath() : ""
params.singularityAssetsMount = projectDir.toRealPath() + "/assets"

singularity {
    enabled = true
    autoMounts = true
    runOptions =  "--no-home" + " -H " + params.singularityTmpMount + " -B " +  params.singularityAssetsMount + " -B " + params.singularityTmpMount + " -B " + params.resourcesBaseDir + params.singularityHLAHDmount + " -B " + params.databases.IEDB_dir + ":/opt/iedb" + " -B " + params.databases.MHCFLURRY_dir + ":/opt/mhcflurry_data"

The directory where nextNEOpi installed:

(nextflow-22.04.0)[kfang@ln01 test]$ ls /scratch/u/kfang/Software/nextNEOpi/
assets  bin  conf  example_batchFile_BAM.csv  example_batchFile_FASTQ.csv  img  LICENSE  nextNEOpi.nf  README.html  README.md  resources

I assumed that the error arises from wrong directory but not sure how to correct it. Please let me know if further information is needed.

Thanks in advance! Kun

riederd commented 1 year ago

Hi, thanks for your interest in nextNEOpi. I'm sorry you ran into troubles, but it should be easily fixable.

The problem is that you set some incorrect values in the your params.config, i.e.:

  trim_adapters        = true
  adapterSeq           = true // "AGATCGGAAGAG" Illumina Universal Adapter
  adapterSeqR2         = true
  adapterSeqFile       = false // fasta file with adapter seqs

  trim_adapters_RNAseq = true
  adapterSeqRNAseq     = true // "AGATCGGAAGAG"  Illumina Universal Adapter
  adapterSeqR2RNAseq   = true
  adapterSeqFileRNAseq = false // fasta file with adapter seqs

  // extra options for fastp

If you want nextNEOpi to trim adapters, it is enough to set trim_adapters = true and trim_adapters_RNAsq = true, the adapter sequences will be determined automatically. However if you want to explicitly provide adapter sequences you may set

adapterSeq = "YOURADAPTERSEQHERE"
adapterSeqR2 = "YOURADAPTERSEQHERE"
adapterSeqRNAseq = "YOURADAPTERSEQHERE"
adapterSeqR2RNAseq = "YOURADAPTERSEQHERE"

Setting those to true is causing your error, so either set them to your adapter sequences or keep them as false.

KunFang93 commented 1 year ago

Thanks for your prompt reply! I follow your suggestion but the pipeline broke with another erroršŸ˜„

Execution cancelled -- Finishing pending tasks before exit
[icbi/nextNEOpi] Pipeline Complete! You can find your results in /scratch/u/kfang/ChenHZ_lab/Neoantigen/test/RESULTS
WARN: WARNING: VEP cache not installed, starting installation. This may take a while.
WARN: WARNING: VEP plugins not installed, starting installation. This may take a while.
WARN: To render the execution DAG in the required format it is required to install Graphviz -- See http://www.graphviz.org for more info.
Error executing process > 'fastp (Patient353_T1star : tumor_RNA)'

Caused by:
  Process `fastp (Patient353_T1star : tumor_RNA)` terminated with an error exit status (127)

Command executed:

  fastp --thread 4 \
      --in1 OSU13053-0000353-T-0204980-HYB00853-Txome-20190612-Illumina_S1_L004_R1_001.fastq.gz \
      --in2 OSU13053-0000353-T-0204980-HYB00853-Txome-20190612-Illumina_S1_L004_R2_001.fastq.gz \
      --out1 Patient353_T1star_tumor_RNA_trimmed_R1.fastq.gz \
      --out2 Patient353_T1star_tumor_RNA_trimmed_R2.fastq.gz \
      --json Patient353_T1star_tumor_RNA_fastp.json \
       \

Command exit status:
  127

Command output:
  (empty)

Command error:
  .command.sh: line 2: fastp: command not found

Work dir:
  /scratch/u/kfang/ChenHZ_lab/Neoantigen/test/work/9a/f94e2c2e9b0bd5ffd50688525c1484

Tip: when you have fixed the problem you can continue the execution adding the option `-resume` to the run command line

If I manually install fastp in the env and run command again, another error will pops up

Error executing process > 'RegionsBedToIntervalList (RegionsBedToIntervalList)'

Caused by:
  Process `RegionsBedToIntervalList (RegionsBedToIntervalList)` terminated with an error exit status (127)

Command executed:

  gatk --java-options -Xmx64G BedToIntervalList \
      -I S07604514_Regions.bed \
      -O S07604514_Regions.interval_list \
      -SD GRCh38.d1.vd1.dict

Command exit status:
  127

Command output:
  (empty)

Command error:
  .command.sh: line 2: gatk: command not found

Work dir:
  /data/kun/ChenHZ_lab/Neoantigen/test/work/07/44e34c80c976589a63d643042e9305

Tip: when you have fixed the problem you can continue the execution adding the option `-resume` to the run command line

It looks like the pipeline doesn't find those installed softwares? I wondered how to solve this? Thanks for your time and help!

Best, Kun

riederd commented 1 year ago

Are you still using -profile conda, if yes, then something went wrong with creating the conda env. I'd need the .nextflow.log file to check. However, we strongly recommend to use singularity by specifying -profile singularity , conda sometimes results into issues.

KunFang93 commented 1 year ago

Yes, I still used -profile conda. I am trying -profile singularity now. It looks good so far!