twitter / algebird

Abstract Algebra for Scala
https://twitter.github.io/algebird
Apache License 2.0
2.29k stars 345 forks source link

Implement randomized LSH according to Chazelle's paper #435

Open perryzheng opened 9 years ago

perryzheng commented 9 years ago

http://courses.cs.washington.edu/courses/cse521/15sp/slides/LSH-CACM.pdf

See bottom of page 119 for LSH pseudocode and see section 4 for a new implementation.

My implementation might not be the same as section 4 but a variation of it.

Let me know if that's a good idea

MansurAshraf commented 9 years ago

How does it compare to MinHash as we already have minhash impl? Other things to consider is whether its associative, commutative etc

avibryant commented 9 years ago

MinHash is for jaccard distance, this is for euclidean distance. However, it's not clear to me whether this belongs in algebird or scalding...

perryzheng commented 9 years ago

Hey @avibryant , based on this paper, http://www.cs.princeton.edu/courses/archive/spr05/cos598E/bib/p253-datar.pdf section 3, it uses say gaussian distribution as a sketch for the high dimensional vector. 1 important question.. This could be expressed in terms of monoids and thus could be part of algebird, right? I'm honestly not very clear

avibryant commented 9 years ago

Gaussian distributions form a monoid, yes. (And actually I would love to see Gaussian added to Algebird, just in itself).

perryzheng commented 9 years ago

Submitted a pull request at https://github.com/twitter/algebird/pull/449