Open perryzheng opened 9 years ago
How does it compare to MinHash as we already have minhash impl? Other things to consider is whether its associative, commutative etc
MinHash is for jaccard distance, this is for euclidean distance. However, it's not clear to me whether this belongs in algebird or scalding...
Hey @avibryant , based on this paper, http://www.cs.princeton.edu/courses/archive/spr05/cos598E/bib/p253-datar.pdf section 3, it uses say gaussian distribution as a sketch for the high dimensional vector. 1 important question.. This could be expressed in terms of monoids and thus could be part of algebird, right? I'm honestly not very clear
Gaussian distributions form a monoid, yes. (And actually I would love to see Gaussian added to Algebird, just in itself).
Submitted a pull request at https://github.com/twitter/algebird/pull/449
http://courses.cs.washington.edu/courses/cse521/15sp/slides/LSH-CACM.pdf
See bottom of page 119 for LSH pseudocode and see section 4 for a new implementation.
My implementation might not be the same as section 4 but a variation of it.
Let me know if that's a good idea