Asymmetric LSH in the code?

rdspring1 / LSH_DeepLearning

Scalable and Sustainable Deep Learning via Randomized Hashing

Apache License 2.0

91 stars 22 forks source link

Asymmetric LSH in the code? #2

Closed WeiFoo closed 6 years ago

WeiFoo commented 7 years ago

Hello,

I was trying to understand your code after reading the paper. I notice that in the paper, you mention, asymmetric LSH functions are used(sec 5.1). However, in the code, when building the hash tables, random projection is used. when retrieving the nodes for the forward propagation, the hash values of data are used to do near neighbor search directly, to look for the "near" nodes(or called weights). I guess the following code does this job.


    public Set<Integer> histogramLSH(int[] hashes)
    {
        assert(hashes.length == m_L);

        Histogram hist = new Histogram();
        for (int idx = 0; idx < m_L; ++idx)
        {
            if (m_Tables.get(idx).containsKey(hashes[idx]))
            {
                hist.add(m_Tables.get(idx).get(hashes[idx]));
            }
        }
        return hist.thresholdSet(m_nn_sizeLimit);
    }

So, my question is, where Asymmetric LSH function has been used? in [38], mentioning that some parameters need to be set for Asymmetric LSH, like m=3, U=0.83, r=2.5. I can't find how these are used/set in this code.

Thanks,

rdspring1 commented 7 years ago

We didn't use the Asymmetric LSH function in the code. SRP (Cosine Similarity) performs very well in practice and is more suitable for GPUs. MIPS is equivalent to NNS when the data points are unit norm. The layer's input and weights have stable norms. i.e. Batch normalization for layer inputs; Exploding or Shrinking gradients occur when the norms for the weights are unstable.

WeiFoo commented 7 years ago

Okay, thanks!

hiro4bbh commented 6 years ago

Hello,

I have read your KDD'17 paper. It is interesting to show some possibilities of highly sparse neural networks!

I tried to reproduce the results of your paper, however, I wonder the following points:

As @WeiFoo pointed, Asymmetric LSH is not used. As @rdspring1 answered, it was enough to use cosine similarities, because the norms should be normalized for stable training. However, I couldn't find any normalization process of the weight vectors in SGD process. Can we find the most activated neurons with cosine similarity with your implementation?
I think the norm of the weight vectors is important, for example, for regression problem. So, We should use inner product instead of cosine similarity. How do you think about this idea?
As in Subsection 3.1 of your KDD'17 paper, multi-probe LSH should be considered. However, I couldn't find any process for multi-probe. Is there any recommendation about multi-probe LSH?

If you have some answers, please tell me them.

rdspring1 commented 6 years ago

There is a correlation between cosine similarity and the active neurons. You can do a simple plot between the neuron activation and cosine similarity. Asymmetric LSH handles MIPS in the general case, so it has robust theoretical guarantees.
Multi-Probe LSH searches nearby buckets to account for the randomness in the LSH fingerprint. For SimHash, you can randomly flip bits in your LSH fingerprint. (i.e. flip a single bit in a random position)