phac-nml / mikrokondo

A simple pipeline for bacterial assembly and quality control
https://phac-nml.github.io/mikrokondo/
MIT License
15 stars 2 forks source link

COMBINE_DATA() fails to use symbolic link to staged fastq files #141

Open sgsutcliffe opened 3 weeks ago

sgsutcliffe commented 3 weeks ago

Description of the bug

When COMBINE_DATA() is run (when a sample is repeated) the path does not seem to be staged and the step fails.

I have tested this on both main and inx_id branches.

For main I supplied the samplesheet

sample,fastq_1,fastq_2,long_reads,assembly
S1,https://github.com/phac-nml/mikrokondo/raw/dev/tests/data/reads/campy-staph1.fq.gz,https://github.com/phac-nml/mikrokondo/raw/dev/tests/data/reads/campy-staph2.fq.gz,,
S1,https://raw.githubusercontent.com/nf-core/test-datasets/viralrecon/illumina/amplicon/sample1_R1.fastq.gz,https://raw.githubusercontent.com/nf-core/test-datasets/viralrecon/illumina/amplicon/sample1_R2.fastq.gz,,

The bash script in the work looks like

#!/bin/bash -euo pipefail
mkdir out
    cat /phac-nml/mikrokondo/raw/dev/tests/data/reads/campy-staph1.fq.gz /nf-core/test-datasets/viralrecon/illumina/amplicon/sample1_R1.fastq.gz > out/S1_R1.merged.fastq.gz;
cat /phac-nml/mikrokondo/raw/dev/tests/data/reads/campy-staph2.fq.gz /nf-core/test-datasets/viralrecon/illumina/amplicon/sample1_R2.fastq.gz > out/S1_R2.merged.fastq.gz;
    touch out/S1_R1.merged.fastq.gz
    touch out/S1_R2.merged.fastq.gz
    touch out/S1.merged.fastq.gz
    touch out/S1.merged.fasta.gz
    cat <<-END_VERSIONS > versions.yml
    "MIKROKONDO:INPUT_CHECK:COMBINE_DATA":
        cat: $(echo $(cat --version 2>&1) | sed 's/^.*coreutils) //; s/ .*$//')
        touch: $(echo $(touch --version 2>&1) | sed 's/^.*coreutils) //; s/ .*$//')
    END_VERSIONS

The symbolic links are present for the sequences and the sequences have been staged (i.e. they are in the work/staged.. folder).

Command used and terminal output

nextflow run phac-nml/mikrokondo -r main -profile docker --input add-sample-samplesheet.csv --outdir results-test -c test-params.config --dehosting_idx tests/data/databases/campy.mmi --mash_sketch tests/data/databases/campy-staph-ecoli.msh --kraken2_db tests/data/kraken2/test

Relevant files

The config file looks like

params {
                outdir = "results"

                platform = "illumina"

                mash_sketch = "tests/data/databases/campy-staph-ecoli.msh"
                mh_min_kmer = 1

                dehosting_idx = "$baseDir/tests/data/databases/campy.mmi"

                kraken2_db = "tests/data/kraken2/test"

                min_reads = 100

                skip_allele_calling = true

                QCReport {
                    fallthrough {
                        search = "No organism specific QC data available."
                        raw_average_quality = 30
                        min_n50 = null
                        max_n50 = null
                        min_nr_contigs = null
                        max_nr_contigs = null
                        fixed_genome_size = 1000
                        min_length = null
                        max_length = null
                        max_checkm_contamination = 3.0
                        min_average_coverage = 30
                    }
                }

                skip_bakta = true
                skip_staramr = false
                skip_mobrecon = false
                skip_checkm = false
                skip_raw_read_metrics = false
                skip_polishing = false

                max_memory = "2.GB"
                max_cpus = 1
}

System information

No response

mattheww95 commented 2 weeks ago

Should be fixed in: https://github.com/phac-nml/mikrokondo/pull/140

Tests were added verifying the bug no longer occurs.