sjyk / sampleclean-async

http://sampleclean.org
Apache License 2.0
92 stars 27 forks source link

Changes juan #27

Closed sanchez575 closed 9 years ago

sanchez575 commented 9 years ago

Added Edit Distance.

tracyhenry commented 9 years ago

The thresholdLCS function in /clean/deduplication/PassJoin.scala returns threshold + 1 when the edit distance between two strings is larger than threshold, which is different from what was implemented by the thresholdLevenshtein function.

I don't know if there is any problem with this difference.

jnwang commented 9 years ago

It seems that "thresholdLevenshtein" has never been used. I think we should remove the "thresholdLevenshtein" function from PassJoin.scala.