nf-core / nascent

Nascent Transcription Processing Pipeline
https://nf-co.re/nascent
MIT License
18 stars 10 forks source link

Issue with concatenation of technical replicates #143

Open bug1303 opened 5 months ago

bug1303 commented 5 months ago

Description of the bug

With a sample sheet like this, with a single-end sample being re-sequenced:

sample,fastq_1,fastq_2
SAMPLE1,seqrun1_sample1_R1.trimmed.fq.gz,
SAMPLE1,seqrun2_sample1_R1.trimmed.fq.gz,

-r 2.1.1 creates pipeline_info/samplesheet.valid.csv with:

sample,single_end,fastq_1,fastq_2
SAMPLE1_T1,True,seqrun1_sample1_R1.trimmed.fq.gz,
SAMPLE1_T2,True,seqrun2_sample1_R1.trimmed.fq.gz,

and calls bwa_mem separately for _T1 and _T2.

(As a side note: it also calls umitools dedup separately for both replicates, although they should be duplicated together if they are the result of resequencing of the same libraries.)

With the recent version -r 2.2.0, there is no samplesheet.valid.csv and

it calls:

bwa mem \
     \
    -t 8 \
    $INDEX \
    seqrun1_sample1_R1.trimmed.fq.gz seqrun2_sample1_R1.trimmed.fq.gz \
    | samtools view  --threads 8 -o SAMPLE1.bam -

which is how the call would look like for paired-end data and hence fails with:

[mem_sam_pe] paired reads have different names: ...

https://nf-co.re/nascent/2.2.0/docs/usage#multiple-runs-of-the-same-sample says : "The pipeline will concatenate the raw reads before performing any downstream analysis.", however unlike in the rnaseq pipeline, I don't see an explicit call of CAT_FASTQ to begin with.

Command used and terminal output

No response

Relevant files

No response

System information

No response