A fuzzy matching string distance library for Scala and Java that includes Levenshtein distance, Jaro distance, Jaro-Winkler distance, Dice coefficient, N-Gram similarity, Cosine similarity, Jaccard similarity, Longest common subsequence, Hamming distance, and more..
The N-gram implementation: https://github.com/vickumar1981/stringdistance/blob/master/src/main/scala/com/github/vickumar1981/stringdistance/impl/NGramImpl.scala
can allow access to the tokens, such that a function:
Ngram.tokens("something", 2)
willresult in anArray[String]
which are the n-grams themselves.the function to do this is here: https://github.com/vickumar1981/stringdistance/blob/master/src/main/scala/com/github/vickumar1981/stringdistance/interfaces/NGramTokenizer.scala#L9