Open brentp opened 1 year ago
mosdepth uses such a check (if read is strictly less than mate) and if not, it saves the read into a hashtable. I think that can work here, but I'm not sure if I miss something about the implementation.
Nope, you aren't missing anything! That is how it does it and it for sure slows things down. I do like the idea in your PR if the functions from htslib fall into place.
Hi Seth, if I understand correctly, from_pileup_mate_aware is run for each column in a pileup. This means it is grouping, sorting, grouping, etc for each position, often with the same sets of reads for each consecutive column.
Do I misread? If not then this will be an extreme bottleneck as it's O(n^2) or worse. We can mitigate by checking if the end of the left-most read is less than the start of the right-most read. Then at least the cost could be minimized to only the percent of reads that overlap.