taleinat / fuzzysearch

Find parts of long text or data, allowing for some changes/typos.
MIT License
301 stars 26 forks source link

Ignoring insertions, deletions or replacements of certain set of characters #45

Closed levitation closed 2 months ago

levitation commented 8 months ago

Would it be possible to count zero distance for certain user-specified characters?

For example, for distance calculation in rapidfuzz I could specify weights=(0, 1, 1), but cannot specify specific characters to ignore. Now in my real use case I need to do string search, not just distance calculation. Therefore I am using fuzzysearch. And instead of specifying weights of (insertion, deletion, substitution), I need to override weights of (insertion, deletion, substitution) of certain characters to zero, while keeping rest of the functionality of find_near_matches intact.

Right now I am mostly interested in ignoring deletions and possibly replacements of certain characters.

taleinat commented 2 months ago

Hi @levitation,

That is not currently possible, no, and is currently out of scope for this library.

levitation commented 2 months ago

Thank you for your response!