kevlar-dev / kevlar

Reference-free variant discovery in large eukaryotic genomes
https://kevlar.readthedocs.io
MIT License
41 stars 9 forks source link

Split out functionality of filter module #315

Closed standage closed 5 years ago

standage commented 5 years ago

The filter module currently serves three purposes: merging interesting k-mer annotations from multiple files (such as when kevlar is run in banding mode), recomputing interesting k-mer abundances, and discarding masked k-mers and k-mers whose corrected abundances no longer meet required thresholds.

The first purpose can and should be separated from the other two purposes. The current implementation of kevlar filter can consume hundreds of gigabytes, negating recent performance improvements based on masking at counting time and the use of error correction.

I suggest the following updates.

standage commented 5 years ago

Fixed in #316.