kylebgorman / EditTransducer

Python implementation of Levenshtein distance and Levenshtein automata matching
MIT License
27 stars 5 forks source link

Timing of Levenshtein Automaton #1

Open funderburkjim opened 6 years ago

funderburkjim commented 6 years ago

Hi! I carried through a test of the Levenshtein Automaton matching for a lexicon of about 200k Sanskrit words. The work is in this repository, specifically in the ejf07 directory.

The automaton usage works fine, and is elegant. However, the efficiency was less than I had hoped.

Does the 5+ sec per closest match surprise you? Is there some optimization step that would drastically reduce this?

kylebgorman commented 6 years ago

When the lexicon is reasonably large I am not shocked that matching could take up to a few seconds. A few things that might help:

I hope at some point to write that method up.

On Fri, Aug 17, 2018 at 4:44 PM funderburkjim notifications@github.com wrote:

Hi! I carried through a test of the Levenshtein Automaton matching for a lexicon of about 200k Sanskrit words. The work is in this repository https://github.com/funderburkjim/pynini-learn, specifically in the ejf07 directory.

The automaton usage works fine, and is elegant. However, the efficiency was less than I had hope.

Does the 5+ sec per closest match surprise you? Is there some optimization step that would drastically reduce this?

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/kylebgorman/EditTransducer/issues/1, or mute the thread https://github.com/notifications/unsubscribe-auth/AAJuOe54SaLL7pLk6RB31htg4_GFo08Hks5uRyshgaJpZM4WCK8x .