larsga / Duke

Duke is a fast and flexible deduplication engine written in Java
Apache License 2.0
615 stars 193 forks source link

ElasticSearchDatabase implementation #193

Closed fabriziofortino closed 9 years ago

fabriziofortino commented 9 years ago

ElasticSeach (v1.4.4) can now be used as Database implementation. The configuration allows to create a new ES node (with DISK or MEMORY indexes) or to connect to an existing ES cluster through transport client connection. The latter mode is extremely useful when we need to deduplicate / link records from an existing ES instance.

See #132

fabriziofortino commented 9 years ago

Travis fails with JDK 6. Elasticsearch requires Java 7 for building and running. Any luck to upgrade it for duke 2.0?

larsga commented 9 years ago

Thank you: this is good work! We probably should abandon JDK 1.6 now (it was released in 2006) and instead switch to 1.7 and 1.8. I'll deal with that later on.

drazzib commented 9 years ago

Thanks @fabriziofortino for this nice Elasticsearch module ! I just posted some small comments on individual commit.

fabriziofortino commented 9 years ago

@drazzib Thanks for reviewing it! I will update the PR as soon as possible