larsga / Duke

Duke is a fast and flexible deduplication engine written in Java
Apache License 2.0
615 stars 194 forks source link

Support for deduplicating data directly from Lucene/Solr/ElasticSearch #132

Open larsga opened 10 years ago

larsga commented 10 years ago

From lar...@gmail.com on August 23, 2013 12:02:34

A simple data source might be enough. Need to consider this a little.

Original issue: http://code.google.com/p/duke/issues/detail?id=131

larsga commented 10 years ago

From gjpv...@gmail.com on September 25, 2013 05:37:20

It would be great feature , the deduplication functionality can be integrated with Apache Solr and works as yet another REST API.

larsga commented 10 years ago

From lar...@gmail.com on September 25, 2013 11:22:35

For ElasticSearch there is actually a module for this: https://github.com/YannBrrd/elasticsearch-entity-resolution It might be an idea to make something similar for Solr. Or one could approach it some other way.