Make a combine FastQs subworkflow

nf-core / modules

Repository to host tool-specific module files for the Nextflow DSL2 community!

https://nf-co.re/modules

MIT License

275 stars 679 forks source link

Make a combine FastQs subworkflow #5009

Open edmundmiller opened 6 months ago

edmundmiller commented 6 months ago

I think this snippet is in multiple pipelines and they're all starting to diverge.

https://github.com/nf-core/methylseq/pull/380#discussion_r1506715623 https://github.com/nf-core/methylseq/pull/380#discussion_r1506712974

https://github.com/nf-core/rnaseq/blob/ed917112c339dfca601895d0d3441763b63254b8/workflows/rnaseq/main.nf#L106-L141 https://github.com/nf-core/methylseq/pull/381 I think it's in chipseq and nascent as well.

jfy133 commented 6 months ago

You mean concatenating FASTQs?

edmundmiller commented 6 months ago

That's probably more clear about what's going on!

edmundmiller commented 6 months ago

Might also be interesting to see the splitfastq logic in hic or sarek included.

jfy133 commented 6 months ago

We also do this in eager, taxprofiler, mag too.. 😅

matthdsm commented 6 months ago

Hot take... combining fastq's is bad practice if you don't extract readgroup data first. Otherwise you lose all the info about the separate fastq's when you align.

In our workflow we extract the readgroup info from each "replicate" and push it into meta, after which we align. We only merge data post alignment so all readgroup info is still present.

mahesh-panchal commented 6 months ago

+1 . I'm doing this in one of my workflows, but I'm also still outputting the individual fastqs so I can do kmer counting on them separately.