Closed ronaldtse closed 3 years ago
One sample process is to be able to correct the transliteration entries in the GNDB. It seems that quite a number of the transliteration pairs do not use the correct system.
An idea: what if there is more than one input:
The return could be something like: a Hash[String -> Float], where Strings are maps tested, and Floats are a similarity score (which could be a Levenshtein distance, percentage of matching characters or other kind of String distance). We could also use #transliterate_each method to consider all possible transliterations.
The idea as presented in this post could be easy to implement only if we know the input.
Agree with the return data type of the hash.
In 3 you probably mean “conversation system selection”, as a way to decide what systems to try.
We could also implement several string distance scores for users to choose from.
Thanks @webdev778 ! Can you also help add documentation for interscript.org?
PR #731
Input: string Output: what transliteration systems have output that match this phrase. List out exact matches and close matches (based on edit distance?)