richarddurbin / pbwt

Implementation of Positional Burrows-Wheeler Transform for genetic data
100 stars 37 forks source link

Bug in algorithm ReportLongMatches? #33

Open pd3 opened 9 years ago

pd3 commented 9 years ago

The algorithm 3 (pbwt -longWithin 5) does not report some long matches. For example, in the following example:

1:1010001 001100
4:1010001 001100
0:0110000 101010
2:0011000 110010
5:0011001 000010
3:1011001 100100

I would think that a match between the haplotypes 5 and 3 should be reported at k=6 or is my interpretation of a "long match" incorrect?

EDIT: or perhaps 5 and 2 for k=6 and 5 and 3 for k=7, depending if we report at the last matching position or immediately after.

adrianodemarino commented 1 year ago

I have observed a similar issue where it appears that some matches are not being reported. It seems that the algorithm matchLongWithin2 is not capturing all of the matches. Is there an alternative implementation that addresses this issue?