idilhaq / duke

Automatically exported from code.google.com/p/duke
0 stars 1 forks source link

Duke not working #142

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
Tried using duke 1.1 on windows 7 64-bit, and used link() for record linkage. 
Two input csv were used, both containing 3 fields(columns). Different 
algorithms were used for matching.
The second input csv has 3 columns but just one data for only one column,say 
first.
Expected to see a difference in probability if the input data are quite 
similar. But the result shows only exact match results, although exact 
comparator was not used.Used LowerCaseNormalize cleaner for data cleansing.

Also, how to implement search using comparators in duke, as it has lucene.

Original issue reported on code.google.com by brindhac...@gmail.com on 20 Jan 2014 at 6:48

GoogleCodeExporter commented 9 years ago
This has to do with how Duke finds candidate matches before doing the detailed 
matching. This is done by the Database component, and the default Lucene 
database requires at least one token to match exactly. If you set (just inside 
the root element):
  <param name="database-implementation" value="in-memory"/>

all records will be returned as candidates, and matching will work. However, 
this database is very slow for large data sets. In Duke 1.2 I add two database 
implementations that can produce inexact matches and which are also faster than 
the Lucene backend. 1.2 will be released very soon.

Original comment by lar...@gmail.com on 20 Jan 2014 at 7:00

GoogleCodeExporter commented 9 years ago
Thanks for your response. It is working. One quick question, Can you tell me 
how and in which class we are using threshold,high,low values to compute and 
display the overall probability?

Every comparator returns a probability. Where exactly, is that probability 
transformed to different value using high,low values?

Original comment by brindhac...@gmail.com on 21 Jan 2014 at 9:41

GoogleCodeExporter commented 9 years ago
The logic for combining low+high with comparator similarity is in 
PropertyImpl.compare. I'm not really satisfied with that, but that's where it 
is for now.

Glad it's working!

Original comment by lar...@gmail.com on 21 Jan 2014 at 1:06