Open edmundmiller opened 6 months ago
You mean concatenating FASTQs?
That's probably more clear about what's going on!
Might also be interesting to see the splitfastq logic in hic or sarek included.
We also do this in eager, taxprofiler, mag too.. 😅
Hot take... combining fastq's is bad practice if you don't extract readgroup data first. Otherwise you lose all the info about the separate fastq's when you align.
In our workflow we extract the readgroup info from each "replicate" and push it into meta
, after which we align. We only merge data post alignment so all readgroup info is still present.
+1 . I'm doing this in one of my workflows, but I'm also still outputting the individual fastqs so I can do kmer counting on them separately.
I think this snippet is in multiple pipelines and they're all starting to diverge.
https://github.com/nf-core/methylseq/pull/380#discussion_r1506715623 https://github.com/nf-core/methylseq/pull/380#discussion_r1506712974
https://github.com/nf-core/rnaseq/blob/ed917112c339dfca601895d0d3441763b63254b8/workflows/rnaseq/main.nf#L106-L141 https://github.com/nf-core/methylseq/pull/381 I think it's in chipseq and nascent as well.