Open funderburkjim opened 6 years ago
When the lexicon is reasonably large I am not shocked that matching could take up to a few seconds. A few things that might help:
closest_match
rather than closest_matches
so as to
avoid the pruning stepI hope at some point to write that method up.
On Fri, Aug 17, 2018 at 4:44 PM funderburkjim notifications@github.com wrote:
Hi! I carried through a test of the Levenshtein Automaton matching for a lexicon of about 200k Sanskrit words. The work is in this repository https://github.com/funderburkjim/pynini-learn, specifically in the ejf07 directory.
The automaton usage works fine, and is elegant. However, the efficiency was less than I had hope.
Does the 5+ sec per closest match surprise you? Is there some optimization step that would drastically reduce this?
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/kylebgorman/EditTransducer/issues/1, or mute the thread https://github.com/notifications/unsubscribe-auth/AAJuOe54SaLL7pLk6RB31htg4_GFo08Hks5uRyshgaJpZM4WCK8x .
Hi! I carried through a test of the Levenshtein Automaton matching for a lexicon of about 200k Sanskrit words. The work is in this repository, specifically in the ejf07 directory.
The automaton usage works fine, and is elegant. However, the efficiency was less than I had hoped.
Does the 5+ sec per closest match surprise you? Is there some optimization step that would drastically reduce this?