ohsu-cedar-comp-hub / WGS-nextflow-workflow

Apache License 2.0
3 stars 1 forks source link

Issue splitting by chromosome for mutect2: "Process `mutect2` declares 5 input channels but 1 were specified" #50

Closed elisabethgoldman closed 4 months ago

elisabethgoldman commented 5 months ago

Problem:

Problem in the way I'm using input and/or output channels to pass multiple chromosomes to Mutect2 for per-chromosome processing; leads to Error: Process mutect2 declares 5 input channels but 1 were specified. Suspect both the channel and workflow setup have errors; there are several ways to set up multiple dynamically-named output files, but none I have tried so far have succeeded. Plan is to merge the 23 files with bcftools and GATK4 commands subsequently (those are largely figured out). See the following for output file options: (https://www.nextflow.io/docs/latest/process.html#multiple-output-files)

Desired Behavior:

Split mutect2 processing by chromosome for speed, and thus output the 3 output file types by chromosome

Desired Output:

Output 23 files for each of three file types (unfiltered.vcf, f1r2.tar.gz, stats).

Nextflow process and subworkflow (pasted together for issue but modularized in repo)

#!/usr/bin/env nextflow

nextflow.enable.dsl=2

// Define the list of chromosomes
chromosomes = (1..22).collect { it.toString() } + ['X']

// Create a channel emitting each chromosome
chrom_channel = Channel.from(chromosomes)

// Define the Mutect2 process
process mutect2 {
    memory '80 G'
    publishDir "${params.outdir}/svc", mode: 'copy'

    input:
    val chrom
    path mutect_idx
    path tumor_bam_sorted
    path normal_bam_sorted
    path pon

    output:
    path("${params.id}_chr${chrom}_unfiltered.vcf")
    path("${params.id}_chr${chrom}_f1r2.tar.gz")
    path("${params.id}_chr${chrom}_unfiltered.vcf.stats")

    script:
    """
    gatk Mutect2 \\
        -R ${mutect_idx} \\
        -I ${tumor_bam_sorted} \\
        -I ${normal_bam_sorted} \\
        --panel-of-normals ${pon} \\
        -normal ${normal_bam_sorted.baseName} \\
        -L ${chrom} \\
        -O ${params.id}_chr${chrom}_unfiltered.vcf \\
        --f1r2-tar-gz ${params.id}_chr${chrom}_f1r2.tar.gz \\
        -stats ${params.id}_chr${chrom}_unfiltered.vcf.stats
    """
}

// Run the process for each chromosome
workflow {
    chrom_channel
        .map { chrom -> 
            tuple(chrom, params.mutect_idx, params.tumor_bam_sorted, params.normal_bam_sorted, params.pon)
        }
        | mutect2
}

params-file

{
    "outdir" : "/home/groups/CEDAR/goldmael/projects/wgs_test_files/data_files",
    "id" : "H_52908",
    "mutect_idx" : "/home/groups/CEDAR/goldmael/projects/wgs_test_files/references/GRCh38.d1.vd1.fa",
    "tumor_bam_sorted" : "DNX230201PS_T_H_52908_B1R1_S8_aligned_sorted_markdup_rg_fixed_rg_fixed_sorted.bam",
    "normal_bam_sorted" : "DNX230201PS_G_H_52908_B1R1_S27_aligned_sorted_markdup_rg_fixed_rg_fixed_sorted.bam",
    "mutect_idx_fai" : "/home/groups/CEDAR/goldmael/projects/wgs_test_files/references/GRCh38.d1.vd1.fa.fai",
    "pon" : "/home/groups/CEDAR/goldmael/projects/wgs_test_files/6c4c4a48-3589-4fc0-b1fd-ce56e88c06e4/gatk4_mutect2_4136_pon.vcf",
    "mutect_dict" : "/home/groups/CEDAR/goldmael/projects/wgs_test_files/references/GRCh38.d1.vd1.fa.dict"
}

Error log:

N E X T F L O W  ~  version 23.10.1
Launching `new_test4.nf` [crazy_morse] DSL2 - revision: a2b07001d4
Process `mutect2` declares 5 input channels but 1 were specified

 -- Check script 'new_test4.nf' at line: 46 or see '.nextflow.log' file for more details
(nextflow) [goldmael@exanode-09-19 data_files]$ cat .nextflow.log
May-13 11:46:58.821 [main] DEBUG nextflow.cli.Launcher - $> nextflow run new_test4.nf -params-file params.json
May-13 11:46:59.430 [main] INFO  nextflow.cli.CmdRun - N E X T F L O W  ~  version 23.10.1
May-13 11:46:59.485 [main] DEBUG nextflow.plugin.PluginsFacade - Setting up plugin manager > mode=prod; embedded=false; plugins-dir=/home/users/goldmael/.nextflow/plugins; core-plugins: nf-amazon@2.1.4,nf-azure@1.3.3,nf-cloudcache@0.3.0,nf-codecommit@0.1.5,nf-console@1.0.6,nf-ga4gh@1.1.0,nf-google@1.8.3,nf-tower@1.6.3,nf-wave@1.0.1
May-13 11:46:59.550 [main] INFO  o.pf4j.DefaultPluginStatusProvider - Enabled plugins: []
May-13 11:46:59.554 [main] INFO  o.pf4j.DefaultPluginStatusProvider - Disabled plugins: []
May-13 11:46:59.557 [main] INFO  org.pf4j.DefaultPluginManager - PF4J version 3.4.1 in 'deployment' mode
May-13 11:46:59.604 [main] INFO  org.pf4j.AbstractPluginManager - No plugins
May-13 11:46:59.894 [main] DEBUG nextflow.cli.CmdRun - Applied DSL=2 from script declararion
May-13 11:46:59.939 [main] INFO  nextflow.cli.CmdRun - Launching `new_test4.nf` [crazy_morse] DSL2 - revision: a2b07001d4
May-13 11:46:59.941 [main] DEBUG nextflow.plugin.PluginsFacade - Plugins default=[]
May-13 11:46:59.941 [main] DEBUG nextflow.plugin.PluginsFacade - Plugins resolved requirement=[]
May-13 11:46:59.965 [main] DEBUG n.secret.LocalSecretsProvider - Secrets store: /home/users/goldmael/.nextflow/secrets/store.json
May-13 11:46:59.984 [main] DEBUG nextflow.secret.SecretsLoader - Discovered secrets providers: [nextflow.secret.LocalSecretsProvider@43effd89] - activable => nextflow.secret.LocalSecretsProvider@43effd89
May-13 11:47:00.213 [main] DEBUG nextflow.Session - Session UUID: 5cf97231-f067-4b23-a8bd-ad103c6f8479
May-13 11:47:00.214 [main] DEBUG nextflow.Session - Run name: crazy_morse
May-13 11:47:00.217 [main] DEBUG nextflow.Session - Executor pool size: 2
May-13 11:47:00.246 [main] DEBUG nextflow.file.FilePorter - File porter settings maxRetries=3; maxTransfers=50; pollTimeout=null
May-13 11:47:00.253 [main] DEBUG nextflow.util.ThreadPoolBuilder - Creating thread pool 'FileTransfer' minSize=10; maxSize=10; workQueue=LinkedBlockingQueue[10000]; allowCoreThreadTimeout=false
May-13 11:47:00.310 [main] DEBUG nextflow.cli.CmdRun - 
  Version: 23.10.1 build 5891
  Created: 12-01-2024 22:01 UTC (14:01 PDT)
  System: Linux 3.10.0-1160.66.1.el7.x86_64
  Runtime: Groovy 3.0.19 on OpenJDK 64-Bit Server VM 17.0.10-internal+0-adhoc..src
  Encoding: UTF-8 (UTF-8)
  Process: 8911@exanode-09-19 [172.20.14.119]
  CPUs: 1 - Mem: 2 GB (763.9 MB) - Swap: 0 (0)
May-13 11:47:00.641 [main] DEBUG nextflow.Session - Work-dir: /home/groups/CEDAR/goldmael/projects/wgs_test_files/data_files/work [nfs]
May-13 11:47:00.649 [main] DEBUG nextflow.Session - Script base path does not exist or is not a directory: /home/groups/CEDAR/goldmael/projects/wgs_test_files/data_files/bin
May-13 11:47:00.661 [main] DEBUG nextflow.executor.ExecutorFactory - Extension executors providers=[]
May-13 11:47:00.686 [main] DEBUG nextflow.Session - Observer factory: DefaultObserverFactory
May-13 11:47:00.782 [main] DEBUG nextflow.cache.CacheFactory - Using Nextflow cache factory: nextflow.cache.DefaultCacheFactory
May-13 11:47:00.804 [main] DEBUG nextflow.util.CustomThreadPool - Creating default thread pool > poolSize: 2; maxThreads: 1000
May-13 11:47:01.725 [main] DEBUG nextflow.Session - Session start
May-13 11:47:03.705 [main] DEBUG nextflow.script.ScriptRunner - > Launching execution
May-13 11:47:04.039 [main] DEBUG nextflow.script.ScriptRunner - Parsed script files:
  Script_fd14ae3fff7adbe2: /home/groups/CEDAR/goldmael/projects/wgs_test_files/data_files/new_test4.nf
May-13 11:47:04.039 [main] DEBUG nextflow.Session - Session aborted -- Cause: Process `mutect2` declares 5 input channels but 1 were specified
May-13 11:47:04.080 [main] DEBUG nextflow.Session - The following nodes are still active:
  [operator] map

May-13 11:47:04.096 [main] ERROR nextflow.cli.Launcher - Process `mutect2` declares 5 input channels but 1 were specified
nextflow.exception.ScriptRuntimeException: Process `mutect2` declares 5 input channels but 1 were specified
        at nextflow.script.ProcessDef.run(ProcessDef.groovy:174)
        at nextflow.script.BindableDef.invoke_a(BindableDef.groovy:51)
        at nextflow.script.ComponentDef.invoke_o(ComponentDef.groovy:40)
        at nextflow.extension.ChannelEx.or(ChannelEx.groovy:132)
        at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:77)
        at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.base/java.lang.reflect.Method.invoke(Method.java:568)
        at org.codehaus.groovy.runtime.metaclass.ReflectionMetaMethod.invoke(ReflectionMetaMethod.java:54)
        at org.codehaus.groovy.runtime.metaclass.NewInstanceMetaMethod.invoke(NewInstanceMetaMethod.java:54)
        at groovy.lang.MetaMethod.doMethodInvoke(MetaMethod.java:323)
        at groovy.lang.MetaClassImpl.invokeMethod(MetaClassImpl.java:1254)
        at groovy.lang.MetaClassImpl.invokeMethod(MetaClassImpl.java:1030)
        at groovy.runtime.metaclass.NextflowDelegatingMetaClass.invokeMethod(NextflowDelegatingMetaClass.java:64)
        at org.codehaus.groovy.runtime.callsite.PojoMetaClassSite.call(PojoMetaClassSite.java:44)
        at org.codehaus.groovy.runtime.callsite.CallSiteArray.defaultCall(CallSiteArray.java:47)
        at org.codehaus.groovy.runtime.callsite.AbstractCallSite.call(AbstractCallSite.java:125)
        at org.codehaus.groovy.runtime.callsite.AbstractCallSite.call(AbstractCallSite.java:139)
        at Script_fd14ae3fff7adbe2$_runScript_closure3$_closure12.doCall(Script_fd14ae3fff7adbe2:46)
        at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:77)
        at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
        at java.base/java.lang.reflect.Method.invoke(Method.java:568)
        at org.codehaus.groovy.reflection.CachedMethod.invoke(CachedMethod.java:107)
        at groovy.lang.MetaMethod.doMethodInvoke(MetaMethod.java:323)
        at org.codehaus.groovy.runtime.metaclass.ClosureMetaClass.invokeMethod(ClosureMetaClass.java:274)
        at groovy.lang.MetaClassImpl.invokeMethod(MetaClassImpl.java:1030)
        at groovy.lang.Closure.call(Closure.java:427)
        at groovy.lang.Closure.call(Closure.java:406)
        at nextflow.script.WorkflowDef.run0(WorkflowDef.groovy:204)
        at nextflow.script.WorkflowDef.run(WorkflowDef.groovy:188)
        at nextflow.script.BindableDef.invoke_a(BindableDef.groovy:51)
        at nextflow.script.IterableDef$invoke_a.call(Unknown Source)
        at org.codehaus.groovy.runtime.callsite.CallSiteArray.defaultCall(CallSiteArray.java:47)
        at org.codehaus.groovy.runtime.callsite.AbstractCallSite.call(AbstractCallSite.java:125)
        at org.codehaus.groovy.runtime.callsite.AbstractCallSite.call(AbstractCallSite.java:139)
        at nextflow.script.BaseScript.run0(BaseScript.groovy:183)
        at nextflow.script.BaseScript.run(BaseScript.groovy:192)
        at nextflow.script.ScriptParser.runScript(ScriptParser.groovy:236)
        at nextflow.script.ScriptRunner.run(ScriptRunner.groovy:242)
        at nextflow.script.ScriptRunner.execute(ScriptRunner.groovy:137)
        at nextflow.cli.CmdRun.run(CmdRun.groovy:372)
        at nextflow.cli.Launcher.run(Launcher.groovy:500)
        at nextflow.cli.Launcher.main(Launcher.groovy:672)
lbeckman314 commented 5 months ago

It looks like the tuple is being treated as a single parameter, as opposed to 5 separate parameters.

We can update the input section to handle this with the tuple qualifier:

// Define the Mutect2 process
process mutect2 {
    ...

-   input:
-   val chrom
-   path mutect_idx
-   path tumor_bam_sorted
-   path normal_bam_sorted
-   path pon   

+ tuple val(chrom),
+       path(mutect_idx),
+       path(tumor_bam_sorted),
+       path(normal_bam_sorted),
+       path(pon)

Can you try re-running the command with this change (included in the 'Full Script' below)?

Full Script

```nf #!/usr/bin/env nextflow nextflow.enable.dsl=2 // Define the list of chromosomes chromosomes = (1..22).collect { it.toString() } + ['X'] // Create a channel emitting each chromosome chrom_channel = Channel.from(chromosomes) // Define the Mutect2 process process mutect2 { memory '80 G' publishDir "${params.outdir}/svc", mode: 'copy' input: // Deconstruct the tuple into separate inputs tuple val(chrom), path(mutect_idx), path(tumor_bam_sorted), path(normal_bam_sorted), path(pon) output: path("${params.id}_chr${chrom}_unfiltered.vcf") path("${params.id}_chr${chrom}_f1r2.tar.gz") path("${params.id}_chr${chrom}_unfiltered.vcf.stats") script: """ gatk Mutect2 \\ -R ${mutect_idx} \\ -I ${tumor_bam_sorted} \\ -I ${normal_bam_sorted} \\ --panel-of-normals ${pon} \\ -normal ${normal_bam_sorted.baseName} \\ -L ${chrom} \\ -O ${params.id}_chr${chrom}_unfiltered.vcf \\ --f1r2-tar-gz ${params.id}_chr${chrom}_f1r2.tar.gz \\ -stats ${params.id}_chr${chrom}_unfiltered.vcf.stats """ } // Run the process for each chromosome workflow { chrom_channel .map { chrom -> tuple(chrom, params.mutect_idx, params.tumor_bam_sorted, params.normal_bam_sorted, params.pon) } | mutect2 } ```

elisabethgoldman commented 5 months ago

Thanks for your help @lbeckman314 ! Seems to have worked to get over the initial hurdle; new one has arisen.

Action taken:

Error:

(nextflow) [goldmael@exanode-09-8 data_files]$ nextflow run new_test6.nf -params-file params.json
N E X T F L O W  ~  version 23.10.1
Launching new_test6.nf [fervent_volta] DSL2 - revision: 121c2d58d7
executor >  local (1)
[b9/74de23] process > mutect2 (2) [  0%] 0 of 23
ERROR ~ Error executing process > 'mutect2 (2)'

Caused by:
  Process `mutect2 (2)` terminated with an error exit status (1)

Command executed:

  gatk Mutect2 \
      -R GRCh38.d1.vd1.fa \
      -I DNX230201PS_T_H_51768_B1R1_S27_aligned_sorted_markdup_rg_fixed_rg_fixed_sorted.bam \
      -I DNX230201PS_G_H_51768_B1R1_S5_aligned_sorted_markdup_rg_fixed_rg_fixed_sorted.bam \
      --panel-of-normals gatk4_mutect2_4136_pon.vcf \
      -normal DNX230201PS_G_H_51768_B1R1_S5_aligned_sorted_markdup_rg_fixed_rg_fixed_sorted \
      -L 2 \
      -O H_51768_chr2_unfiltered.vcf \
      --f1r2-tar-gz H_51768_chr2_f1r2.tar.gz \
      -stats H_51768_chr2_unfiltered.vcf.stats

Command exit status:
  1

Command output:
  (empty)

Command error:
  Valid only if "ReadLengthReadFilter" is specified:
  --max-read-length <Integer>   Keep only reads with length at most equal to the specified value  Default value:
                                2147483647. 

  --min-read-length <Integer>   Keep only reads with length at least equal to the specified value  Default value: 30. 

  Valid only if "ReadNameReadFilter" is specified:
  --read-name <String>          Keep only reads with this read name  Required. 

  Valid only if "ReadStrandFilter" is specified:
  --keep-reverse-strand-only <Boolean>
                                Keep only reads on the reverse strand  Required. Possible values: {true, false} 
executor >  local (2)
[21/f03435] process > mutect2 (1) [  4%] 1 of 22, failed: 1
ERROR ~ Error executing process > 'mutect2 (2)'

Caused by:
  Process `mutect2 (2)` terminated with an error exit status (1)

Command executed:

  gatk Mutect2 \      -R GRCh38.d1.vd1.fa \
      -I DNX230201PS_T_H_51768_B1R1_S27_aligned_sorted_markdup_rg_fixed_rg_fixed_sorted.bam \      -I DNX230201PS_G_H_51768_B1R1_S5_aligned_sorted_markdup_rg_fixed_rg_fixed_sorted.bam \
      --panel-of-normals gatk4_mutect2_4136_pon.vcf \
      -normal DNX230201PS_G_H_51768_B1R1_S5_aligned_sorted_markdup_rg_fixed_rg_fixed_sorted \
      -L 2 \
      -O H_51768_chr2_unfiltered.vcf \      --f1r2-tar-gz H_51768_chr2_f1r2.tar.gz \
      -stats H_51768_chr2_unfiltered.vcf.stats

Command exit status:
  1
Command output:
  (empty)

Command error:
  Valid only if "ReadLengthReadFilter" is specified:  --max-read-length <Integer>   Keep only reads with length at most equal to the specified value  Default value:                                2147483647. 

  --min-read-length <Integer>   Keep only reads with length at least equal to the specified value  Default value: 30. 
    Valid only if "ReadNameReadFilter" is specified:
  --read-name <String>          Keep only reads with this read name  Required.   
  Valid only if "ReadStrandFilter" is specified:
  --keep-reverse-strand-only <Boolean>
                                Keep only reads on the reverse strand  Required. Possible values: {true, false} 

  Valid only if "ReadTagValueFilter" is specified:
  --read-filter-tag <String>    Look for this tag in read  Required. 

  --read-filter-tag-comp <Float>Compare value in tag to this value  Default value: 0.0.   
  --read-filter-tag-op <Operator>
                                Compare value in tag to value with this operator. If T is the value in the tag, OP is the
                                operation provided, and V is the value in read-filter-tag, then the read will pass the                                filter iff T OP V is true.  Default value: EQUAL. Possible values: {LESS, LESS_OR_EQUAL,
                                GREATER, GREATER_OR_EQUAL, EQUAL, NOT_EQUAL} 

  Valid only if "SampleReadFilter" is specified:
  --sample <String>             The name of the sample(s) to keep, filtering out all others  This argument must be
                                specified at least once. Required. 

  Valid only if "SoftClippedReadFilter" is specified:
  --invert-soft-clip-ratio-filter <Boolean>
                                Inverts the results from this filter, causing all variants that would pass to fail and
                                visa-versa.  Default value: false. Possible values: {true, false} 

  --soft-clipped-leading-trailing-ratio <Double>
                                Threshold ratio of soft clipped bases (leading / trailing the cigar string) to total bases
                                in read for read to be filtered.  Default value: null.  Cannot be used in conjunction with
                                argument(s) minimumSoftClippedRatio

  --soft-clipped-ratio-threshold <Double>
                                Threshold ratio of soft clipped bases (anywhere in the cigar string) to total bases in
                                read for read to be filtered.  Default value: null.  Cannot be used in conjunction with
                                argument(s) minimumLeadingTrailingSoftClippedRatio

  ***********************************************************************

  A USER ERROR has occurred: -stats is not a recognized option

  ***********************************************************************
  Set the system property GATK_STACKTRACE_ON_USER_EXCEPTION (--java-options '-DGATK_STACKTRACE_ON_USER_EXCEPTION=true') to print the stack trace.

Work dir:
  /home/groups/CEDAR/goldmael/projects/wgs_test_files/data_files/work/b9/74de23995e5b914a1cb6096d33da6d

Tip: you can try to figure out what's wrong by changing to the process work dir and showing the script file named `.command.sh`

 -- Check '.nextflow.log' file for details

testing environment on exacloud (within conda environment nextflow):

(nextflow) [goldmael@exanode-09-8 data_files]$ gatk -version
Using GATK jar /home/users/goldmael/miniconda3/envs/nextflow/share/gatk4-4.4.0.0-0/gatk-package-4.4.0.0-local.jar
Running:
    java -Dsamjdk.use_async_io_read_samtools=false -Dsamjdk.use_async_io_write_samtools=true -Dsamjdk.use_async_io_write_tribble=false -Dsamjdk.compression_level=2 -jar /home/users/goldmael/miniconda3/envs/nextflow/share/gatk4-4.4.0.0-0/gatk-package-4.4.0.0-local.jar -version
The Genome Analysis Toolkit (GATK) v4.4.0.0
HTSJDK Version: 3.0.5
Picard Version: 3.0.0
(nextflow) [goldmael@exanode-09-8 data_files]$ conda info 

     active environment : nextflow
    active env location : /home/users/goldmael/miniconda3/envs/nextflow
            shell level : 1
       user config file : /home/users/goldmael/.condarc
 populated config files : /home/users/goldmael/.condarc
          conda version : 23.10.0
    conda-build version : not installed
         python version : 3.11.6.final.0
       virtual packages : __archspec=1=broadwell
                          __glibc=2.17=0
                          __linux=3.10.0=0
                          __unix=0=0
       base environment : /home/users/goldmael/miniconda3  (writable)
      conda av data dir : /home/users/goldmael/miniconda3/etc/conda
  conda av metadata url : None
           channel URLs : https://conda.anaconda.org/conda-forge/linux-64
                          https://conda.anaconda.org/conda-forge/noarch
                          https://conda.anaconda.org/bioconda/linux-64
                          https://conda.anaconda.org/bioconda/noarch
                          https://conda.anaconda.org/r/linux-64
                          https://conda.anaconda.org/r/noarch
          package cache : /home/users/goldmael/miniconda3/pkgs
                          /home/users/goldmael/.conda/pkgs
       envs directories : /home/users/goldmael/miniconda3/envs
                          /home/users/goldmael/.conda/envs
               platform : linux-64
             user-agent : conda/23.10.0 requests/2.31.0 CPython/3.11.6 Linux/3.10.0-1160.66.1.el7.x86_64 centos/7.9.2009 glibc/2.17 solver/libmamba conda-libmamba-solver/23.11.0 libmambapy/1.5.3
                UID:GID : 4733:3010
             netrc file : None
           offline mode : False
elisabethgoldman commented 5 months ago

Update:

rlancaster96 commented 5 months ago

Update:

  • Can't get around same error using either of the attempts detailed below

Tried:

  • Removing -stats in Mutect2 call to see if it would be automatically output; produces same error
  • Using --stats ; same error

I ran into this too! --stats is only a parameter for filtermutectcalls, not mutect2, I think it was committed to that script in error and should only have been in filtermutectcalls. I remember fixing this error during my run through; the mutect2.nf script in the main branch is currently fixed