taleinat / fuzzysearch

Find parts of long text or data, allowing for some changes/typos.
MIT License
301 stars 26 forks source link

find_near_matches() uses substitution for distance calculation even if max_substitutions=0 is set. #41

Closed markussteindl closed 2 months ago

markussteindl commented 2 years ago

Reproduction:

find_near_matches('Hello world', 'Hello babab', max_substitutions=0, max_l_dist=5)
# [Match(start=0, end=11, dist=5, matched='Hello babab')]

Is this intended behavior? Without substitution, the distance should be 10 and not 5. Thus the above call should not return any matches.

taleinat commented 2 years ago

Hey @Stonatus,

This is currently the intended behavior, yes. The "dist" attribute of matches describes the Levenstein / edit distance, which is indeed 5 in this case.

I can see that it could be useful to see the number of allowed changes needed given the input parameters. I'll leave this open while I consider if there's a neat way to implement this given the existing design.

taleinat commented 2 months ago

I think this is too niche a use-case and not something that would be very simple to implement, so I'm going to close this request.