Why there are repeated matches?

larsga / Duke

Duke is a fast and flexible deduplication engine written in Java

Apache License 2.0

613 stars 194 forks source link

Hello, I can successfully run the deduplicate code with Duke, but the matched two records repeat twice in the match result, why? For example, if records of ID1 and ID2 match, there will be two matches. The first one is ID1 and ID2, and the second one is ID2 and ID1. They are the same match despite of the different order. I am looking forward to your reply. Thank you!

Here is the XML Configuration code: ` <object class="no.priv.garshol.duke.comparators.Levenshtein" name="StringComparator">

0.8 ID URL StringComparator 0.49 0.9 TITLE StringComparator 0.3 0.7 SOURCE StringComparator 0.49 0.51 TIME CONTENT StringComparator 0.2 0.9

larsga / Duke

Why there are repeated matches? #238