taleinat / fuzzysearch

Find parts of long text or data, allowing for some changes/typos.
MIT License
301 stars 26 forks source link

Match include more chars than expected, if only deletions is specified #18

Closed georgh closed 4 years ago

georgh commented 4 years ago

Hallo

Thank you for your great work. I noticed that

find_near_matches("TESTabc", "TEST123", max_deletions=5, max_substitutions=0, max_insertions=0)

will result in [Match(start=0, end=5, dist=3, matched='TEST1')]. Thats not what I would expect in this case, it should only match till 4.

Is this somehow intended?

taleinat commented 4 years ago

Hi @georgh,

That's not intended, it is indeed a bug! Thanks for reporting it :)

While there's an inherent ambiguity regarding what is best to return in many such cases, in this case "TEST" is indeed what fuzzysearch is intended to return.

taleinat commented 4 years ago

That was a nasty bug!

I've just fixed it, and will soon make a new release including this fix.

georgh commented 4 years ago

Wow, you are fast! Thanks a lot for your work =)

taleinat commented 4 years ago

Fix available in version 0.7.1.

taleinat commented 4 years ago

Wow, you are fast! Thanks a lot for your work =)

Thanks for the kind words! I'm happy you find this useful :)