humburg / pirates

Improving the quality of deep sequencing data
MIT License
0 stars 0 forks source link

Filter candidate UIDs prior to merging #24

Closed humburg closed 7 years ago

humburg commented 7 years ago

Reduce the number of comparisons needed to identify matches for UIDs with errors by excluding all UIDs that obviously have no chance of being selected.

The primary strategy used here relies on a simple comparison based on sequence composition. This allows the exclusion of UIDs that would not match the current UID with the required maximum number of differences, even if the bases in each sequence are arranged to minimise the difference between the two sequences. As a result, the number of necessary comparisons can be reduced substantially with relatively little overhead.

coveralls commented 7 years ago

Coverage Status

Coverage increased (+7.6%) to 84.01% when pulling 5ef6b431fcb4becb83369f430c99ee0b34f1b886 on filter into f237d37a7874116dcae3b32d8f09cfc8ccf1fb0f on singletons.