snowcloud / engineclub

the ALISS engine code
http://engineclub.org
BSD 3-Clause "New" or "Revised" License
9 stars 6 forks source link

Deduplication Work #18

Closed d0ugal closed 11 years ago

d0ugal commented 12 years ago

Hi,

This contains my first draft of the work, so opening this for you to review and feedback before its merged. The work is all fairly well self contained, however there is one area that touches other things. I've started storing and re-using the Solr connection. This has been all working fine, but something you may want to take note of or review. There are also a few extra unused imports that I removed too in one file.

The matching hooks into the resource adding and adds a very simple UI for this, so I'm sure you will want to work on it further - if not throw away the small bit I added after using it to test.

The hard thing here is picking a good value for the cutoff of matches. I suggest trying to get it live fairly soon in a simple for and then adding some analytics hook to record various values and then use that data to improve it.

Dougal