maxharlow / csvmatch

🔎 Finds fuzzy matches between CSV files
Other
183 stars 22 forks source link

descriptor 'union' of 'set' object needs an argument #21

Closed r3ptar closed 5 years ago

r3ptar commented 6 years ago

I am receiving "descriptor 'union' of 'set' object needs an argument" when using --fuzzy. This is happening after I hit f to finish matching records for the machine learning. Below is my syntax:

csvmatch file1.csv file2.csv --fields1 'Employee#' 'Last Name' First Name' --fields2 'Employee#' 'Last Name' 'First Name' --fuzzy > newfile.csv

maxharlow commented 6 years ago

Which version are you using? Find out with csvmatch -v.

Could you also try running it with --fuzzy levenshtein and see if you get the same error?

r3ptar commented 6 years ago

Version is 1.17

Running with levenshtein seemed to have worked just fine, but the results are incredibly wrong.. Is this limited to only comparing between two documents? or could I do 3 or 4?

maxharlow commented 6 years ago

It seems it's a problem with the default machine-learning Bilenko method of fuzzy matching. If you need a workaround for now you could try either adjusting the Levenshtein match threshold using -r, or using the other fuzzy matching methods.

Could you try upgrading to 1.18 and see if you still get the same error?

CSV Match can only match between two spreadsheets -- if you needed to match between more than that you could use the output from matching files one and two against the third file, and so on for further files.

r3ptar commented 6 years ago

Upgraded to 1.18. Now it doesn't do anything just display > in terminal.

maxharlow commented 6 years ago

Oh. Is that just with the default fuzzy matching, or with anything? Are you on Mac or Windows? Which version of Python?

r3ptar commented 6 years ago

yeah that's just with default fuzzy. Normal matches work just fine. I'm running this on centos 7 using Python 2.7.5

maxharlow commented 6 years ago

Ok -- does it give that error for any files you try and match, or just a specific one?

r3ptar commented 6 years ago

any files

maxharlow commented 6 years ago

Hmm.. I'm not sure how I can reproduce this. Perhaps try with it installed with Python 3?

r3ptar commented 5 years ago

I just downgraded my csv version and then reinstalled python and it seems to have resolved things for me. I was able to successfully run my match.

maxharlow commented 5 years ago

Excellent! :+1: