Closed WeiFoo closed 6 years ago
We didn't use the Asymmetric LSH function in the code. SRP (Cosine Similarity) performs very well in practice and is more suitable for GPUs. MIPS is equivalent to NNS when the data points are unit norm. The layer's input and weights have stable norms. i.e. Batch normalization for layer inputs; Exploding or Shrinking gradients occur when the norms for the weights are unstable.
Okay, thanks!
Hello,
I have read your KDD'17 paper. It is interesting to show some possibilities of highly sparse neural networks!
I tried to reproduce the results of your paper, however, I wonder the following points:
If you have some answers, please tell me them.
There is a correlation between cosine similarity and the active neurons. You can do a simple plot between the neuron activation and cosine similarity. Asymmetric LSH handles MIPS in the general case, so it has robust theoretical guarantees.
Multi-Probe LSH searches nearby buckets to account for the randomness in the LSH fingerprint. For SimHash, you can randomly flip bits in your LSH fingerprint. (i.e. flip a single bit in a random position)
Hello,
I was trying to understand your code after reading the paper. I notice that in the paper, you mention, asymmetric LSH functions are used(sec 5.1). However, in the code, when building the hash tables, random projection is used. when retrieving the nodes for the forward propagation, the hash values of data are used to do near neighbor search directly, to look for the "near" nodes(or called weights). I guess the following code does this job.
So, my question is, where Asymmetric LSH function has been used? in [38], mentioning that some parameters need to be set for Asymmetric LSH, like m=3, U=0.83, r=2.5. I can't find how these are used/set in this code.
Thanks,