Closed hmnd closed 5 years ago
@maxharlow would you rather I leave out the style changes?
Hi @hmnd -- sorry for taking so long to get back to you. Would you mind removing the style changes?
To explain: up until now this has been a personal project -- albeit one I've encouraged others to use -- so it's written it in my own idiosyncratic Python style, which I prefer to Pep8. However, if I start getting more pull requests of substance like yours I'll reconsider this though to ease such contributions.
No worries :) I've reverted the styling changes.
Ok, I've just released v1.18, which refactors the way matchings work. It doesn't include weightings, but it does let you specify a different threshold for each field -- does that work for your use case?
Sorry for the delay in my reply. Different thresholds per field are still different from weightings. Weightings allow you to create a balance of a number of fields to account for known inaccuracies in data. For instance, in one project, I'm matching on name and address. Since addresses change and may not be correct, I have a balance of 85%/15% between name/address, so a person may not match based on address, but will still match on name. Is that a bit more clear?
I appreciate that the two are different concepts, but it does it let you do what you need here?
For the project you described, that might be something like:
$ csvmatch -1 name address -2 name address -t 0.8 0.1 first.csv second.csv
Apologies if there's some important nuance that I've missed!
setup.py
to latest versions