Make overall structure for deduplication algorithm

tanussingh / Big-Data-Management-Analytics-Project

Final Project for CS 6350.001 - Large Scale Data Collection and preprocessing in Spark

3 stars 2 forks source link

Open ishansharma opened 5 years ago

ishansharma commented 5 years ago

Issues to figure out:

What to compare on?
Order of comparison? Right now, we plan to look at NER first, then UDPipe and then doc2vec vector similarity with Jacquard/cosine similarity.

ishansharma commented 5 years ago

ishansharma commented 5 years ago