jamesturk / jellyfish

🪼 a python library for doing approximate and phonetic matching of strings.
https://jamesturk.github.io/jellyfish/
MIT License
2.04k stars 157 forks source link

Match Rating comparison doesn't give any output (missing value) #155

Closed ammubharatram closed 2 years ago

ammubharatram commented 2 years ago

Reproducible example:

print(match_rating_codex("MARIE HELENE"))

Output: MRHLN

print(match_rating_codex("MARIA RIO")) Output: MR

When I run match rating comparison, it doesn't give any output:

match_rating_comparison('MARIE HELENE','MARIA RIO')

However if I remove the word 'Helene' in the first name, I get boolean output. Can't understand when it gives missing value and when not. For my dataset of 7000 names, it gave 4 missing values. Any idea why?

Other trials to reproduce:

match_rating_comparison('MARIE RELENE','MARIA RIO')

Output: True It seems like H at the beginning seems the problem by multiple trials. Seems like a bug with H to me. Any thoughts on this? Thanks in advance!

jamesturk commented 2 years ago

just added a test case for this, will try to take a look when I get a chance, if someone else wants to get to it first the problem is likely in cjellyfish/mra.c