taleinat / fuzzysearch

Find parts of long text or data, allowing for some changes/typos.
MIT License
301 stars 26 forks source link

Imperfect matches aren't returned completely #30

Closed heshamwhite closed 4 years ago

heshamwhite commented 4 years ago

We are facing a case where the returned Match object isn't complete (for texts >8 characters)

For example: When you call find_near_matches as follows

find_near_matches('aaaaa', 'aaaxx', max_l_dist=2)
'[Match(start=0, end=5, dist=2, matched='aaaxx')]'

It returns the expected result , note that it correctly includes the 'xx' part of the text as well but for the following call

find_near_matches('aaaaaaaaa', 'aaaaaaaxx', max_l_dist=2)
[Match(start=0, end=8, dist=2, matched='aaaaaaax')]

here it drops the second x at the end (even though the max_l_dist is 2

This issue began to appear in version 0.6, before it used to work correctly I traced the error to changes made to the levenshtein_ngram.py in _py_expand_short function, but couldn't specify where exactly.

Thanks and Regards

Setup: Python 3.7.7, fuzzysearch 0.7.1

taleinat commented 4 years ago

Hi @heshamwhite,

Yes that's a bug, thanks for the report! I'm looking into it.

taleinat commented 4 years ago

Fixed! This will be included in the next release.

heshamwhite commented 4 years ago

Great! Thank you for the amazing support. If it's possible may I inquire roughly when will be the next release?

taleinat commented 4 years ago

I've just released v0.7.2.