standage commented 5 years ago

Testing out a complete rewrite of the varfilter module. Should drastically reduce memory requirements. Run time comparison in progress.

codecov[bot] commented 5 years ago

Codecov Report

Merging #354 into master will decrease coverage by 0.08%. The diff coverage is 100%.

@@            Coverage Diff             @@
##           master     #354      +/-   ##
==========================================
- Coverage   97.13%   97.05%   -0.08%     
==========================================
  Files          48       48              
  Lines        2894     2886       -8     
  Branches      532      533       +1     
==========================================
- Hits         2811     2801      -10     
- Misses         51       52       +1     
- Partials       32       33       +1

Impacted Files	Coverage Δ
kevlar/cli/varfilter.py	`100% <ø> (ø)`	:arrow_up:
kevlar/intervalforest.py	`77.42% <100%> (-4.06%)`	:arrow_down:
kevlar/vcf.py	`96% <100%> (+0.03%)`	:arrow_up:
kevlar/varfilter.py	`100% <100%> (ø)`	:arrow_up:

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update af7ae4e...ff64a40. Read the comment docs.

standage commented 5 years ago

The difference in runtime is tremendous: 3 minutes vs 30+ minutes before.

I don't know if the biggest factor is the difference in building vs querying an interval tree, or Python's object overhead, or what. In any case, storing a much smaller amount of data in memory and streaming the "big" data is always the better idea and should have been my first thought.

kevlar-dev / kevlar

Reimplement varfilter module #354

Codecov Report