taleinat / fuzzysearch

Find parts of long text or data, allowing for some changes/typos.
MIT License
301 stars 26 forks source link

Error in output list from find_near_matches_with_ngrams() #3

Closed kevinrue closed 10 years ago

kevinrue commented 10 years ago

Hi Tal,

Your updated find_near_matches_with_ngrams systematically omits the first match in the output list. On the other hand, the last match is systematically duplicated. Can you fix that please?

See examples:

One exact match and two mismatches present in string

fuzzysearch.find_near_matches_with_ngrams("GGGTTLTTSS","XXXXXXXXXXXXXXXXXXXGGGTTVTTSSAAAAAAAAAAAAAGGGTTVTTSSAAAAAAAAAAAAAAAAAAAAAABBBBBBBBBBBBBBBBBBBBBBBBBGGGTTLTTSS", 1) Out[9]: [Match(start=42, end=52, dist=1), Match(start=99, end=109, dist=0), Match(start=99, end=109, dist=0)]

Two exact match and one mismatches present in string

fuzzysearch.find_near_matches_with_ngrams("GGGTTLTTSS","XXXXXXXXXXXXXXXXXXXGGGTTVTTSSAAAAAAAAAAAAAGGGTTLTTSSAAAAAAAAAAAAAAAAAAAAAABBBBBBBBBBBBBBBBBBBBBBBBBGGGTTLTTSS", 1) Out[8]: [Match(start=42, end=52, dist=0), Match(start=42, end=52, dist=0), Match(start=99, end=109, dist=0)]

One exact match and one mismatch present in string

fuzzysearch.find_near_matches_with_ngrams("GGGTTLTTSS","XXXXXXXXXXXXXXXXXXXGGGTTVTTSSAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAABBBBBBBBBBBBBBBBBBBBBBBBBGGGTTLTTSS", 1) Out[4]: [Match(start=89, end=99, dist=0), Match(start=89, end=99, dist=0)]

Thanks!

taleinat commented 10 years ago

Thanks for the bug report!

Fixed in v0.2.1.

kevinrue commented 10 years ago

You're welcome!

Even if I love biology, I've always enjoyed software development tools.. I'm happy I can help bring life to your repository :)