Guidance/documentation which parameters lead to long runtime

taleinat / fuzzysearch

Find parts of long text or data, allowing for some changes/typos.

MIT License

301 stars 26 forks source link

Firstly, thanks for the great tool! I came across this corner-case: the program below runs for longer than I expect. It took ~2.5mins to complete:

import fuzzysearch
from devtools import debug

pattern = "old pond\nfrog leaps in"
s = "frog leaps in\nold pond\nwater's sound"

debug(
    fuzzysearch.find_near_matches(
        pattern,
        s,
        max_substitutions=int(0.2 * len(pattern)),  # = 4
        max_deletions=int(0.5 * len(pattern)),  # = 11
        max_insertions=int(0.5 * len(pattern)),  # = 11
    )
)

The result is:

% time python old_pond.py
old_pond.py:7 <module>
    fuzzysearch.find_near_matches( pattern, s, ...): [
        Match(start=0, end=13, dist=9, matched='frog leaps in'),
    ] (list) len=1
python old_pond.py  137.45s user 2.34s system 99% cpu 2:20.39 total

Environment: System: MacOS 11.4 Python 3.9.5 fuzzysearch 0.7.3 (compiled)

taleinat / fuzzysearch

Guidance/documentation which parameters lead to long runtime #37