kevlar-dev / kevlar

Reference-free variant discovery in large eukaryotic genomes
https://kevlar.readthedocs.io
MIT License
40 stars 9 forks source link

Filter based on proband k-mer abundance #339

Closed standage closed 5 years ago

standage commented 5 years ago

It is often unreasonable to expect that all k-mers spanning the variant in the proband will be high abundance, and so kevlar has never required that all spanning k-mers are "interesting" or putatively novel. Even filtering based on the number of spanning k-mers that are "interesting" has been problematic, since some true variants sometimes have few spanning k-mers that meet the required thresholds.

However, kevlar has never filtered calls based on what is not expected in the k-mer spanning the variant in the proband. While it is common to have low abundance k-mers intermittently, it is very rare to have long stretches of low abundance k-mers spanning the variant in a true variant.

This update introduces a new filter that will discard a variant prediction if there are 5 or more low-abundance k-mers spanning the variant call. Based on observations, this will eliminate many false positives, while having little to no effect on true calls.

codecov[bot] commented 5 years ago

Codecov Report

Merging #339 into master will increase coverage by 0.04%. The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #339      +/-   ##
==========================================
+ Coverage   95.01%   95.05%   +0.04%     
==========================================
  Files          48       48              
  Lines        3005     3029      +24     
  Branches      563      570       +7     
==========================================
+ Hits         2855     2879      +24     
  Misses        108      108              
  Partials       42       42
Impacted Files Coverage Δ
kevlar/call.py 88.32% <ø> (ø) :arrow_up:
kevlar/simlike.py 96.35% <100%> (+0.31%) :arrow_up:
kevlar/cli/simlike.py 100% <100%> (ø) :arrow_up:
kevlar/cli/call.py 100% <100%> (ø) :arrow_up:
kevlar/vcf.py 95.96% <100%> (+0.01%) :arrow_up:
kevlar/varmap.py 99.02% <100%> (+0.01%) :arrow_up:

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update d9f0838...4536b99. Read the comment docs.

standage commented 5 years ago

Also in this PR: changed some recently added hard-coded filters to configurable/disableable parameters/options.