nextflow-io / nextflow

A DSL for data-driven computational pipelines
http://nextflow.io
Apache License 2.0
2.68k stars 622 forks source link

Missing output file(s) error #4470

Closed setshabaTaukobong closed 10 months ago

setshabaTaukobong commented 10 months ago

Hi there, I am trying to do genome assembly using several tools and the below script:

#!/usr/bin/env nextflow

nextflow.enable.dsl=2

/*
 *
========================================================================================
         GENASS: Genome Assembly Pipeline for Nanopore Sequencing Data 
========================================================================================

# Homepage / Documentation
 GitHub - DIPLOMICS [1KSA Genome Assembly project]
 # Authors
 Setshaba Taukobong <setshaba.taukobong@diplomics.org.za> <sc.taukobong@gmail.com>

---------------------------------------------------------------------------------------
 *
 */

/*
========================================================================================
                        Define parameters, channels and processes
========================================================================================
*/

/*
 * Define the default parameters
 */ 

params.podsF = '/home/staukobong/pod5/'
pods_ch = Channel.fromPath(params.podsF, checkIfExists: true)

/*
 * Basecalling PORE5 files using Dorado
 */

process BASECALL {

    debug true

    input:
    path sample_id

    output:
    path 'sample_id.bam' , emit: bamfiles_complete

    script:
    """
    dorado basecaller /home/staukobong/dna_r10.4.1_e8.2_400bps_hac@v4.2.0 $sample_id > sample_id.bam
    """
}

/*
 * Convert fastq files to bam files and concatenate the files
 */

process CONVERT {

    debug true

    input:
    path sample_id

    output:
    path 'sample_id.fastq', emit: fastq_files

    script:
    """
    samtools bam2fq $sample_id > sample_id.fastq
    """
}

/*
 * Check quality of sequencing reads using FASTQC
 */

process FASTQC1 {

    debug true

    input:
    path sample_id

    output:
    path 'sample_id_fastqc.html', emit: fastqc_files

    script:
    """
    fastqc $sample_id -t 4
    """

}

/*
 * Trim fastq files after base calling using Nanofilt
 */

process TRIM {

    debug true

    input:
    path sample_id

    output:
    path 'sample_id.trimmed.fastq', emit: trimmed_fastq

    script:
    """
    NanoFilt -l 200 -q 20 --headcrop 50 --tailcrop 6000 $sample_id > sample_id.trimmed.fastq
    """
}

/*
 * Check quality of sequencing reads using FASTQC
 */

process FASTQC2 {

    debug true

    input:
    path sample_id2

    output:
    path 'sample_id2_fastqc.html', emit: fastqc_files2

    script:
    """
    fastqc $sample_id2 -t 4
    """

}

/*
 * Assemble the reads using FLYE
 */

process ASSEMBLY {

    debug true

    input:
    path sample_id

    output:
    path '*', emit: Assembly_files

    script:
    """
    flye --nano-raw $sample_id -i 3 -t 4
    """

}

/*
 * Mapping the reads using minimap2
 */

process MAPPINGS {

    debug true

    input:
    path sample_id

    output:
    path 'sample_id.sam', emit: Mapped_files

    script:
    """
    minimap2 -a -t 4 ${sample_id}.trimmed.fastq ${sample_id}.fasta > sample_id.sam
    """

}

/*
========================================================================================
                                Create default workflow
========================================================================================
*/

workflow {
    BASECALL(pods_ch)
    CONVERT(BASECALL.out.bamfiles_complete)
    FASTQC1(CONVERT.out.fastq_files)
    TRIM(CONVERT.out.fastq_files)
    FASTQC2(TRIM.out.trimmed_fastq)
    ASSEMBLY(TRIM.out.trimmed_fastq)
    MAPPINGS(TRIM.out.trimmed_fastq.combine(ASSEMBLY.out.Assembly_files))

}

The script works however once it gets to 5th process (FASTQC2), it gives me the following error even though the output file is generated but also the script stops running:

ERROR ~ Error executing process > 'FASTQC2 (1)'

Caused by:
  Missing output file(s) `sample_id2_fastqc.html` expected by process `FASTQC2 (1)`

Command executed:

  fastqc sample_id.trimmed.fastq -t 4

Command exit status:
  0

Command output:
  Analysis complete for sample_id.trimmed.fastq

Command error:
  Started analysis of sample_id.trimmed.fastq
  Approx 5% complete for sample_id.trimmed.fastq
  Approx 10% complete for sample_id.trimmed.fastq
  Approx 20% complete for sample_id.trimmed.fastq
  Approx 25% complete for sample_id.trimmed.fastq
  Approx 35% complete for sample_id.trimmed.fastq
  Approx 40% complete for sample_id.trimmed.fastq
  Approx 50% complete for sample_id.trimmed.fastq
  Approx 55% complete for sample_id.trimmed.fastq
  Approx 65% complete for sample_id.trimmed.fastq
  Approx 70% complete for sample_id.trimmed.fastq
  Approx 80% complete for sample_id.trimmed.fastq
  Approx 85% complete for sample_id.trimmed.fastq
  Approx 95% complete for sample_id.trimmed.fastq
Work dir:
  /home/staukobong/work/89/9ecd8124e69bdb6ad772d423711c87

Tip: when you have fixed the problem you can continue the execution adding the option `-resume` to the run command line

 -- Check '.nextflow.log' file for details

Could I get some assistance on this. Not sure what the problem might be. Thank you.

setshabaTaukobong commented 10 months ago

Hi, I have not solved this issue as yet. Not sure why its closed

pditommaso commented 10 months ago

Because it's unreadable. Consider posting on https://community.seqera.io/

bentsherman commented 10 months ago

HI @setshabaTaukobong , I edited your post to be more readable. You just need to use triple backticks for code blocks

bentsherman commented 10 months ago

Based on your error I think you need to fix your FASTQC2 process as follows:

process FASTQC2 {

    debug true

    input:
    path sample_id2

    output:
    path "${sample_id2}_fastqc.html", emit: fastqc_files2

    script:
    """
    fastqc $sample_id2 -t 4
    """

}