kevlar-dev / kevlar

Reference-free variant discovery in large eukaryotic genomes
https://kevlar.readthedocs.io
MIT License
40 stars 9 forks source link

New filter to drop high abundance outlier k-mers #350

Closed standage closed 5 years ago

standage commented 5 years ago

This update adds a new filter to the simlike module to discard outlier high abundance k-mers at the ends of a variant-spanning window. As described in #349, a single bad k-mer with high abundance in all 3 samples can throw off the likelihood score, even if the rest of the k-mers show good separation (high abundance in proband, 0 abundance in parents).

This filter is disabled by default, since there are some cases in which using this filter will increase the number of false positives substantially.

Closes #349.

codecov[bot] commented 5 years ago

Codecov Report

Merging #350 into master will increase coverage by 0.03%. The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #350      +/-   ##
==========================================
+ Coverage   95.06%   95.09%   +0.03%     
==========================================
  Files          48       48              
  Lines        3039     3055      +16     
  Branches      570      574       +4     
==========================================
+ Hits         2889     2905      +16     
  Misses        108      108              
  Partials       42       42
Impacted Files Coverage Δ
kevlar/simlike.py 96.48% <100%> (+0.25%) :arrow_up:
kevlar/cli/simlike.py 100% <100%> (ø) :arrow_up:

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update f0c5942...68943ee. Read the comment docs.