vertica / VerticaPy

VerticaPy is a Python library that exposes sci-kit like functionality to conduct data science projects on data stored in Vertica, thus taking advantage Vertica’s speed and built-in analytics and machine learning capabilities.
https://www.vertica.com/python/
Apache License 2.0
219 stars 45 forks source link

Jaro-Winkler distance #341

Closed gaetan-dion closed 1 year ago

gaetan-dion commented 2 years ago

Hi,

In several project, we would use Jaro-Winkler distance :

This method is implemented in Jellyfish library, and we would find this interesting to add this method to Vertica and/or VerticaPy.
Because this method is expensive to execute on only one node, this calculation have to found all matches and transpositions between 2 strings.
We know Vertica already have levenshtein distance, but Jaro-Winkler give good results also, and furthermore its result is normalized between 0 and 1, which make easier comparison and interpretation.

Jaro-Winkler is used in several use cases, to compare 2 strings, for :

oualib commented 2 years ago

Jaro Winkler is on its way. It should be soon available.