sstadick / perbase

Per-base per-nucleotide depth analysis
MIT License
115 stars 13 forks source link

WIP: Mate fix efficient #65

Closed brentp closed 1 year ago

brentp commented 1 year ago

I inadvertently built this on top of #61 . But it tries to only do the expensive operation of mate tracking when there's a possibility that the mates overlap. If we know a given read end falls strictly before the mate then they can't overlap. If there is overlap, then they are pushed into a hashmap, there's not sorting or groubpy.

Still need to benchmark this against the existing impl, but I suspect it should be faster given many overlaps.

brentp commented 1 year ago

welp. In my test run, this is slower than master so, not worth pursuing, probably.

brentp commented 1 year ago

I wonder if it is worth a PR to rust-htslib to expose: bam_mplp_init_overlaps. it seems that in perbase, enabling overlap detection results in greater than 10X slowdown. Might be closer to 100X.

sstadick commented 1 year ago

Yeah, baring a bam_mplp_init_overlaps I'm not sure such a tradeoff is worth it at the moment.