jsoma / fuzzy_pandas

Fuzzy matches and merging of datasets in pandas using csvmatch
MIT License
72 stars 19 forks source link

object of type 'float' has no len() fuzzy_merge #5

Open ZhihaoMa opened 2 years ago

ZhihaoMa commented 2 years ago

Hi, I match two Chinese firm databases using the package. Here is my code:

_import pandas as pd import fuzzy_pandas as fpd import dask.dataframe as dd

company_names = 'C:/Users/acemec/Documents/firm_data/company_annual.csv'

new_companies_name = 'C:/Users/acemec/Documents/firm_data/Pat_firm_list.csv'

mylist = []

for chunk in pd.read_csv(company_names, on_bad_lines='skip', encoding='Latin-1', dtype=object, low_memory=False, chunksize=200000): mylist.append(chunk)

companies = pd.concat(mylist, axis = 0) del mylist

mylist = []

for chunk in pd.read_csv(new_companies_name, on_bad_lines='skip', encoding='Latin-1', dtype=object, low_memory=False, chunksize=200000): mylist.append(chunk)

new_companies = pd.concat(mylist, axis= 0) del mylist

match = fpd.fuzzy_merge(new_companies, companies, left_on=['assignee'], right_on=['company_name'], keep_left=['assignee'], keep_right = ['company_name', 'tyc_id', 'company_id'], method='levenshtein', threshold=0.85)

df = pd.DataFrame(match) df.to_csv('C:/Users/acemec/Documents/firm_data/match_reslts.csv', encoding='utf-8')__

And I find some errors:

object of type 'float' has no len() fuzzy_merge

Could you give me some suggestions? Thx.

kanlancb commented 2 years ago

+1