Closed Eugeneifes closed 8 years ago
This might have to do with what database (backend) you're using. Is Duke searching only fields which have no values? Try setting lookup=true on the relevant properties as described at the bottom here https://github.com/larsga/Duke/wiki/XMLConfig to see if that helps.
Thanks a lot! lookup=true/false
really works for my issue!
Was i right, when i guessed that empty fields bring some weight to the overall probability?
If so, I still can't understand why empty fields don't skip automatically
Was i right, when i guessed that empty fields bring some weight to the overall probability?
I hope not. That would be a bug. Seriously, I don't see any reason to assume that.
If so, I still can't understand why empty fields don't skip automatically
Matching proceeds in two steps: (1) get candidate records from database, (2) match properties. What happened here was that Duke never found any candidates to match, so the property matching never happened at all.
Now I see! Thank you again
Np. Does this mean we can close the issue?
Sure
I have evident duplicates in my database, but Duke constantly returns "NO MATCH FOR"
I also tried to run Duke in Debug mode (with the evident duplicates at the input) - Duke shows, that these two records are same with high probability (as I expected, overall probability = 0.7) , that is higher than my threshold
I should mention, that my database has many columns (250 attributes), and most values are missed (no value, empty), but due to the documentation these fields should be skipped and do not have inpact on the final probability https://github.com/larsga/Duke/wiki/HowItWorks
I also tried to run Duke with only filled data (about 2-5 filled fields instead of 250 with missing values) - Duke works well So i can conclude, that Duke assigns weight to the missing fields (Although this should not happen due to documentation) I tried to run Duke in both Deduplication and Record Linkage modes - it didn't help
How should i run Duke to make it show me duplicates in my sparse dataset?