kevlar-dev / kevlar

Reference-free variant discovery in large eukaryotic genomes
https://kevlar.readthedocs.io
MIT License
41 stars 9 forks source link

Skip likelihood score calculation for some filtered calls #324

Closed standage closed 5 years ago

standage commented 5 years ago

With regards to filtering out preliminary calls, my default decision making thus far has been to keep everything for as long as possible in case the information proves useful. For example, the kevlar simlike module calculates likelihood scores for every single preliminary variant it can, even those that have already been filtered out as passenger variants or unreliable alignments, etc. The thinking is that if one of those filtered variants somehow happens to be of interest to the end user, it's better to give more information to the user than less.

For the sake of performance, it may be worth skipping the likelihood score calculation for some classes of filtered variants. If it can give us a 2x or 3x speedup, then it probably makes sense to skip at least some filtered variants that are extremely unlikely to be of interest to anyone.