pckhoi / datamatch

Utilities for data matching
MIT License
5 stars 0 forks source link

Jarowinkler similarity #4

Closed ayyubibrahimi closed 2 months ago

ayyubibrahimi commented 2 months ago

Hey @pckhoi

Just tried to open a PR for this bug:

Traceback (most recent call last):
      File "/Users/ayyub/Desktop/llead/processing/match/new_orleans_pd.py", line 573, in <module>
        cprr23 = match_cprr23_to_pprr(cprr, pprr)
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
      File "/Users/ayyub/Desktop/llead/processing/match/new_orleans_pd.py", line 534, in match_cprr23_to_pprr
        matcher = ThresholdMatcher(
                  ^^^^^^^^^^^^^^^^^
      File "/opt/homebrew/lib/python3.12/site-packages/datamatch/matchers.py", line 101, in init
        self._score_all_pairs()
      File "/opt/homebrew/lib/python3.12/site-packages/datamatch/matchers.py", line 138, in scoreall_pairs
        sim = max(
              ^^^^
      File "/opt/homebrew/lib/python3.12/site-packages/datamatch/matchers.py", line 139, in <genexpr>
        self._scorer.score(ser_a, ser_b)
      File "/opt/homebrew/lib/python3.12/site-packages/datamatch/scorers.py", line 67, in score
        sim_vec[k] = scls.sim(a[k], b[k])
                     ^^^^^^^^^^^^^^^^^^^^
      File "/opt/homebrew/lib/python3.12/site-packages/datamatch/similarities.py", line 59, in sim
        return jaro_winkler(unidecode(a), unidecode(b), self._prefix_weight)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    TypeError: jaro_winkler() takes 2 positional arguments but 3 were given

The fix that I implemented was changing from:

return jaro_winkler(unidecode(a), unidecode(b), self._prefix_weight)

To: return jaro_winkler(unidecode(a), unidecode(b), prefix_weight=self._prefix_weight)

pckhoi commented 2 months ago

@ayyubibrahimi what is the version of your Levenshtein package?

ayyubibrahimi commented 2 months ago

0.25.1

pckhoi commented 2 months ago

The version that datamatch uses is 0.12. No wonder. But why did you upgrade to 0.25.1 though?

ayyubibrahimi commented 2 months ago

Hm. After some investigation, it doesn't look like any other packages depend on the Levenshtein package locally anymore, and so the upgrade is no longer necessary.