Soundex is a phonetic algorithm -> http://en.wikipedia.org/wiki/Soundex
I used Key collision metaphone3 method, which is a way to transform tokens into the way they are pronounced.
Example:
Parque nacional de gama
Parque nacional do iguacu
Parque nacional do itatiaia
Parque Nacional de Itatiaia
Kolmogorov complexity -> http://en.wikipedia.org/wiki/Kolmogorov_complexity
to estimate 'similarity' between strings and has been widely applied to the comparison of strings originating from DNA sequencing.
Example:
Podocarpus National Park, Cajanuma at Casa de Pedesur
Podocarpus National Park, Cajanuma, at Casa de predesur
Soundex is a phonetic algorithm -> http://en.wikipedia.org/wiki/Soundex I used Key collision metaphone3 method, which is a way to transform tokens into the way they are pronounced. Example: Parque nacional de gama Parque nacional do iguacu Parque nacional do itatiaia Parque Nacional de Itatiaia
Kolmogorov complexity -> http://en.wikipedia.org/wiki/Kolmogorov_complexity to estimate 'similarity' between strings and has been widely applied to the comparison of strings originating from DNA sequencing. Example: Podocarpus National Park, Cajanuma at Casa de Pedesur Podocarpus National Park, Cajanuma, at Casa de predesur
levenshtein -> http://en.wikipedia.org/wiki/Levenshtein_distance Nearest neighbor, distance function Example: Salto Iguazu Salto tguazu