taleinat / fuzzysearch

Find parts of long text or data, allowing for some changes/typos.
MIT License
301 stars 26 forks source link

Enhancement: Prevent matching of neighbour characters #20

Closed georgh closed 4 years ago

georgh commented 4 years ago

I currently have the case, where I would like to fuzzymatch with possible a few errors - but only if those are not next to each other.

So simpel example: Pattern: "I love you" That should find the sentence in "And he saId: L luve yuu" but it should not match "And he sald: I hate you"

So some kind of option to reduce fuzzy neighbour matches (or the number of) would be cool. That might actually be a slightly bigger extention, so I would maybe look into it by myself if you say thats reasonable to achieve with the current codebase.

Thanks a lot for your great work!

taleinat commented 4 years ago

Hi @georgh,

A simple way to achieve this would be to just run a fuzzy search as usual (e.g. using find_near_matches()), and then have a second step which filters only matches that meet additional criteria. This should be fast while giving a high level of flexibility.

There are other fuzzy search libraries out there which offer a higher level of customization, such as the regex library, though using them will require more learning and could limit you later if you need further refinements which they do not support.

I don't intend to expand the capabilities of this library to directly support such search criteria at the moment, so I'm closing this issue.