Closed maxibor closed 8 months ago
nf-core lint
overall result: Failed :x:Posted for pipeline commit 7e3f119
+| ✅ 183 tests passed |+
#| ❔ 1 tests were ignored |#
-| ❌ 11 tests failed |-
I would say it is 1.1.5.
Getting back with a review later today :)
I'm somewhat concerned with this feature. If I remember correctly, FASTQC and fastp use the first 50 and 75 bp, respectively, to judge read duplication. Using longer sequences would drive up memory requirements and take longer. So the first question is, are we truly only removing identical reads with this?
My second question comes from my inexperience with sequencing: If you have a dominant species in your metagenomic sample, how unlikely is it to have an identical read?
sing longer sequences would drive up memory requirements and take longer. So the first question is, are we truly only removing identical reads with this?
Does it really? the README at least seems to implies it's some condensed hash of the whole read: https://github.com/OpenGene/fastp#duplication-rate-evaluation. That said, it's opt-in so it's still up to the user to decide if it's a suitable algorithm
My second question comes from my inexperience with sequencing: If you have a dominant species in your metagenomic sample, how unlikely is it to have an identical read?
An absolutely exact duplicate is quite unlikely, as
fragmentation protocols should be random (with a slight preference breakages around GCs IIRC), so that in combination with (relatively) longer reads it's unlikely due to sequence diversity.
Exact duplicates are much more likely from lab-based amplicons as they use the same priming sequence, and given the number of amplification cycles also very likely to have copies from artifical duplicate rather than naturally occuring. At least in Illumina short-read protocols that is.
Thank you for your response, sounds good to me then. 👍🏼
This PR adds deduplication of reads with fastp
PR checklist
nf-core lint
).nextflow run . -profile test,docker --outdir <OUTDIR>
).nextflow run . -profile debug,test,docker --outdir <OUTDIR>
).CHANGELOG.md
is updated.