Open mikemc opened 1 week ago
After reading the new Readme, I realize this may be effectively handled by the second pass of clumpify following merging (at least in the taxonomic profiling workflow)
Yeah, this is currently done in the taxonomy
subworkflow (which also then gets passed to the second half of the hv
subworkflow), as this is where paired reads get merged into single reads.
I'd like to test the behaviour of this step more extensively, to make absolutely sure that it's handling RC duplicates as we expect. Once that's verified, I'm open to copying this process over to other parts of the pipeline as & when it makes sense.
Starting an issue to keep track of this limitation in our current implementations of deduplication and duplicate statistics.
Currently the
CLUMPIFY_PAIRED
process has the comment flag "NB: Will NOT handle reverse-complement duplicates". I also believe that duplication statistics are currently being generated from FASTQC, which also does not handle reverse-complement duplicates.