nf-core / nanoseq

Nanopore demultiplexing, QC and alignment pipeline
https://nf-co.re/nanoseq
MIT License
174 stars 80 forks source link

pipeline terminating early #264

Open nick-youngblut opened 6 months ago

nick-youngblut commented 6 months ago

Description of the bug

I just want to use the pipeline for QC'ing my nanopore data, but it prematurely terminates after the initial step of the pipeline:

[70/93224f] process > NFCORE_NANOSEQ:NANOSEQ:INPUT_CHECK:SAMPLESHEET_CHECK (SampleSheet.csv) [100%] 1 of 1 ✔
[-        ] process > NFCORE_NANOSEQ:NANOSEQ:QCFASTQ_NANOPLOT_FASTQC:NANOPLOT                -
[-        ] process > NFCORE_NANOSEQ:NANOSEQ:QCFASTQ_NANOPLOT_FASTQC:FASTQC                  -
[-        ] process > NFCORE_NANOSEQ:NANOSEQ:BAM_RENAME                                      -
[-        ] process > NFCORE_NANOSEQ:NANOSEQ:CUSTOM_DUMPSOFTWAREVERSIONS                     -
[-        ] process > NFCORE_NANOSEQ:NANOSEQ:MULTIQC
-[nf-core/nanoseq] Pipeline completed successfully-

Command used and terminal output

nextflow run main.nf \
  --input SampleSheet.csv \
  --outdir path/to/output/ \
  --protocol cDNA \
  --skip_demultiplexing \
  --skip_vc \
  --skip_sv \
  --skip_alignment \
  --skip_differential_analysis \
  --skip_quantification \
  --skip_modification_analysis \
  --skip_fusion_analysis \
  -profile docker

Relevant files

My SampleSheet.csv file:

group,replicate,barcode,input_file,fasta,gtf
sample1,1,17,/path/to/basecalling/output/basecalling/barcode17/fastq_runid_57875ca7c4726448f62a97db8456c62308842af6_11240_0.fastq.gz,GRCh38,
sample1,2,17,/path/to/basecalling/output/basecalling/barcode17/fastq_runid_57875ca7c4726448f62a97db8456c62308842af6_10664_0.fastq.gz,GRCh38,
sample2,1,18,/path/to/basecalling/output/basecalling/barcode18/fastq_runid_57875ca7c4726448f62a97db8456c62308842af6_11240_0.fastq.gz,GRCh38,
sample2,2,18,/path/to/basecalling/output/basecalling/barcode18/fastq_runid_57875ca7c4726448f62a97db8456c62308842af6_10664_0.fastq.gz,GRCh38,

System information

nick-youngblut commented 6 months ago

It appears that the issue is due to --skip_demultiplexing. A simple reprex:

nextflow run main.nf   --outdir /home/nickyoungblut/projects/SspArc0008_10x_cDNA_longRead/data/SspArc0008_10x_cDNA_longRead/nanoseq_TEST/   --protocol cDNA   --skip_demultiplexing   -profile docker,test

[53/2f0a9a] process > NFCORE_NANOSEQ:NANOSEQ:INPUT_CHECK:SAMPLESHEET_CHECK (samplesheet_nobc_dx.csv)      [100%] 1 of 1 ✔
executor >  local (1)
[53/2f0a9a] process > NFCORE_NANOSEQ:NANOSEQ:INPUT_CHECK:SAMPLESHEET_CHECK (samplesheet_nobc_dx.csv)      [100%] 1 of 1 ✔
[-        ] process > NFCORE_NANOSEQ:NANOSEQ:QCFASTQ_NANOPLOT_FASTQC:NANOPLOT                             -
[-        ] process > NFCORE_NANOSEQ:NANOSEQ:QCFASTQ_NANOPLOT_FASTQC:FASTQC                               -
[-        ] process > NFCORE_NANOSEQ:NANOSEQ:PREPARE_GENOME:GET_CHROM_SIZES                               -
[-        ] process > NFCORE_NANOSEQ:NANOSEQ:PREPARE_GENOME:GTF2BED                                       -
[-        ] process > NFCORE_NANOSEQ:NANOSEQ:PREPARE_GENOME:SAMTOOLS_FAIDX                                -
[-        ] process > NFCORE_NANOSEQ:NANOSEQ:ALIGN_MINIMAP2:MINIMAP2_INDEX                                -
[-        ] process > NFCORE_NANOSEQ:NANOSEQ:ALIGN_MINIMAP2:MINIMAP2_ALIGN                                -
[-        ] process > NFCORE_NANOSEQ:NANOSEQ:BAM_SORT_INDEX_SAMTOOLS:SAMTOOLS_VIEW_BAM                    -
[-        ] process > NFCORE_NANOSEQ:NANOSEQ:BAM_SORT_INDEX_SAMTOOLS:SAMTOOLS_SORT                        -
[-        ] process > NFCORE_NANOSEQ:NANOSEQ:BAM_SORT_INDEX_SAMTOOLS:SAMTOOLS_INDEX                       -
[-        ] process > NFCORE_NANOSEQ:NANOSEQ:BAM_SORT_INDEX_SAMTOOLS:BAM_STATS_SAMTOOLS:SAMTOOLS_STATS    -
[-        ] process > NFCORE_NANOSEQ:NANOSEQ:BAM_SORT_INDEX_SAMTOOLS:BAM_STATS_SAMTOOLS:SAMTOOLS_FLAGSTAT -
[-        ] process > NFCORE_NANOSEQ:NANOSEQ:BAM_SORT_INDEX_SAMTOOLS:BAM_STATS_SAMTOOLS:SAMTOOLS_IDXSTATS -
[-        ] process > NFCORE_NANOSEQ:NANOSEQ:CUSTOM_DUMPSOFTWAREVERSIONS                                  -
[-        ] process > NFCORE_NANOSEQ:NANOSEQ:MULTIQC                                                      -
-[nf-core/nanoseq] Pipeline completed successfully-

The QC steps (e.g., NanoPlot) appear to be directly associated with the demultiplexing section of the pipeline, instead of applied to all downstream demux'd files (user provided demux'd files, or files demux'd by the pipeline):

    if (!params.skip_demultiplexing) {

        /*
         * MODULE: Demultipexing using qcat
         */
        QCAT ( ch_input_path )
        ch_fastq = Channel.empty()
        QCAT.out.fastq
            .flatten()
            .map { it -> [ it, it.baseName.substring(0,it.baseName.lastIndexOf('.'))] }
            .join(ch_sample, by: 1) // join on barcode
            .map { it -> [ it[2], it[1], it[3], it[4], it[5], it[6] ] }
            .set { ch_fastq }
        ch_software_versions = ch_software_versions.mix(QCAT.out.versions.ifEmpty(null))
    } else {
        if (!params.skip_alignment) {
            ch_sample
                .map { it -> if (it[6].toString().endsWith('.gz')) [ it[0], it[6], it[2], it[1], it[4], it[5] ] }
                .set { ch_fastq }
        } else {
            ch_fastq = Channel.empty()
        }
    }

If params.skip_demultiplexing or params.skip_alignment (or NOT it[6].toString().endsWith('.gz')), then ch_fastq = Channel.empty(), and so no fastq files to process future in the pipeline.

It would greatly help to have the columns associated with the index values in:

.map { it -> [ it[2], it[1], it[3], it[4], it[5], it[6] ] } 

and:

.map { it -> if (it[6].toString().endsWith('.gz')) [ it[0], it[6], it[2], it[1], it[4], it[5] ] }
nick-youngblut commented 6 months ago

Changing ch_fastq = Channel.empty() to ch_sample.map { it -> [ it[0], it[6] ] }.set { ch_fastq } enables the completion of NANOPLOT and FASTQC.

Still, the multi-qc report is not generated, which seems to be due to an unmet dependency at:

        MULTIQC (
        ch_multiqc_config,
        ch_multiqc_custom_config.collect().ifEmpty([]),
        ch_fastqc_multiqc.ifEmpty([]),
        ch_samtools_multiqc.collect().ifEmpty([]),
        ch_featurecounts_gene_multiqc.ifEmpty([]),
        ch_featurecounts_transcript_multiqc.ifEmpty([]),
        CUSTOM_DUMPSOFTWAREVERSIONS.out.mqc_yml.collect(),
        ch_workflow_summary.collectFile(name: 'workflow_summary_mqc.yaml')
        )

With my edits (above), ch_fastqc_multiqc is not empty, so I would think that MULTIQC would run.

The following edit works:

        MULTIQC (
        ch_multiqc_config,
        ch_multiqc_custom_config.collect().ifEmpty([]),
        ch_fastqc_multiqc.collect().ifEmpty([])//,
        //ch_samtools_multiqc.collect().ifEmpty([]),
        //ch_featurecounts_gene_multiqc.ifEmpty([]),
        //ch_featurecounts_transcript_multiqc.ifEmpty([]),
        //CUSTOM_DUMPSOFTWAREVERSIONS.out.mqc_yml.collect(),
        //ch_workflow_summary.collectFile(name: 'workflow_summary_mqc.yaml')
        )

Note: I updated process MULTIQC accordingly.

Also note: I had to include collect() to ch_fastqc_multiqc