nf-core / rnaseq

RNA sequencing analysis pipeline using STAR, RSEM, HISAT2 or Salmon with gene/isoform counts and extensive quality control.
https://nf-co.re/rnaseq
MIT License
911 stars 705 forks source link

SALMON_INDEX runs when using --aligner star_rsem even if samples have explicit strandedness #975

Closed drpatelh closed 1 year ago

drpatelh commented 1 year ago

Description of the bug

nextflow run main.nf \
    --input https://raw.githubusercontent.com/nf-core/test-datasets/rnaseq/samplesheet/samplesheet_test.csv \
    --fasta 'https://github.com/nf-core/test-datasets/raw/rnaseq/reference/genome.fasta' \
    --gtf 'https://github.com/nf-core/test-datasets/raw/rnaseq/reference/genes.gtf' \
    -profile docker \
    --aligner star_rsem \
    --outdir results \

Even though we have specified reverse explicitly for all samples in the samplesheet the following process is still run to create the Salmon index in order to subsample the reads to infer the strandedness:

[3c/a91c08] process > NFCORE_RNASEQ:RNASEQ:FASTQ_SUBSAMPLE_FQ_SALMON:SALMON_INDEX (genome.transcripts.fa)             [100%] 1 of 1, cached: 1 ✔

This process shouldn't be run, however, it is being triggered since this logic evaluates to true: https://github.com/nf-core/rnaseq/blob/6e1e448f535ccf34d11cc691bb241cfd6e60a647/workflows/rnaseq.nf#L239

We need to find a way to add to that logic somehow to also check that these channels are empty: https://github.com/nf-core/rnaseq/blob/6e1e448f535ccf34d11cc691bb241cfd6e60a647/workflows/rnaseq.nf#L234

If you have no intention of running pseudo-alignment, for now, the easiest workaround is to provide a dummy path that exists to --salmon_index /my/random/existing/folder/ as this won't actually be used or validated elsewhere by the pipeline and will save creating the index.

drpatelh commented 1 year ago

Fixed in https://github.com/nf-core/rnaseq/pull/978