seatgeek / fuzzywuzzy

Fuzzy String Matching in Python
http://chairnerd.seatgeek.com/fuzzywuzzy-fuzzy-string-matching-in-python/
GNU General Public License v2.0
9.21k stars 874 forks source link

Fuzzywuzzy process.extract - Nan output #250

Open kapil3sh opened 4 years ago

kapil3sh commented 4 years ago

Find the code below :

df1 = pd.read_excel("some excel.xlsx")

df1["Full Name"] = df1["First Name] + " " +df1["Last Name"] str2match = "John Smith"

strOptions = df1["Full Name"].tolist()

c_matches = process.extract(str2match,strOptions) print(c_matches)

output : [(nan, 72), (nan, 72), (nan, 72), (nan, 72), (nan, 72)]

Note : I'm pretty sure that the DataFrame isn't empty and all objects in the DataFrame and the list are strings.

Please help or guide me to the solution if already resolved.

I'm using python - 3.7.4 fuzzywuzzy - 0.17.0 pandas - 0.25.1

Argent02 commented 4 years ago

The correct string combination is : (java port). Use Leveichestein Distance to calculate the distance between strings before implementing the set ratio.