jsh58 / NGmerge

Merging paired-end reads and removing adapters
MIT License
45 stars 15 forks source link

feature request: use false positive rate instead of error rate? #17

Open eboyden opened 3 years ago

eboyden commented 3 years ago

Hi, I'm a big fan of this software but was wondering if it might make sense to provide the option to threshold based on a false positive rate instead of error rate (similar to what SeqPurge does using the binomial distribution calculation), since longer overlaps should be more tolerant of higher error rates. We've found that we obtain the best performance when piping multiple instances of NGmerge to grossly simulate this effect; e.g. to simulate a 1E-6 FP threshold, we allow 8% errors for overlaps of 10-14 bp, 17% errors for overlaps of 15-19 bp, and 23% errors for overlaps of 20+ bp. But obviously this is still overly stringent for longer overlaps, not to mention time consuming.

jsh58 commented 3 years ago

Thanks for the question. This is an interesting topic that requires two separate answers, for the two modes of NGmerge:

eboyden commented 3 years ago

In any case, thanks for the response and the software. I understand that implementing feature requests is time consuming and not always a high priority - just letting you know there's interest if you (or anyone) were inclined.