VerticaPy is a Python library that exposes sci-kit like functionality to conduct data science projects on data stored in Vertica, thus taking advantage Vertica’s speed and built-in analytics and machine learning capabilities.
This method is implemented in Jellyfish library, and we would find this interesting to add this method to Vertica and/or VerticaPy.
Because this method is expensive to execute on only one node, this calculation have to found all matches and transpositions between 2 strings.
We know Vertica already have levenshtein distance, but Jaro-Winkler give good results also, and furthermore its result is normalized between 0 and 1, which make easier comparison and interpretation.
Jaro-Winkler is used in several use cases, to compare 2 strings, for :
Detect duplicates values (as mistyped names...)
To replace strings by normalized strings (like compagny names...), which permit to made a join with external referentials as INSEE
Hi,
In several project, we would use Jaro-Winkler distance :
This method is implemented in Jellyfish library, and we would find this interesting to add this method to Vertica and/or VerticaPy.
Because this method is expensive to execute on only one node, this calculation have to found all matches and transpositions between 2 strings.
We know Vertica already have levenshtein distance, but Jaro-Winkler give good results also, and furthermore its result is normalized between 0 and 1, which make easier comparison and interpretation.
Jaro-Winkler is used in several use cases, to compare 2 strings, for :