Open mritunjaymusale opened 4 years ago
Thanks for your suggestion. Let us make this the priority! We'll @ you when it is done.
Thank you @keroro824 !
Sort of related, but I've been building R bindings.
@wrathematics Thanks for contributing 👍
Hi, are there any updates on this?
Is it possible for you to port this into Cython and release it as PyPI package ? It would be easy for existing DL users(tf and pytorch users) to use it natively in their code.
I'm also interested in implementing such a thing. But it seems to me the way to do this would be to implement custom layers instead of builtin ones. This could be added to the main codebase once it is tested rather than a separate package.
For example in pytorch you would first subclass 'torch.autograd.Function' to implement forward and backward operations which calculate the hashing operations and take that into account in forward and back propagation. Cython might not be needed as you might be able to use numba and get better performance more easily.
@keroro824 I've actually started doing what I described. I have a question: Do you have some justification for only propagating the gradient to active neurons? It's not obvious to me why this would be a good approximation of the true gradient. There is another method the math would suggest: the gradient w.r.t the input of a linear layer is (repeated indices indicate a sum): yi = W{ij} x_{j} + b_i dx_k = dy_i W_{ik} So we can use LSH for calculating the backprop but we need more hash tables than the paper suggests. The multiplications in the backprop are by columns of the weight matrix, and the forward prop is multiplication by rows of the weight matrix. Did you try something like this?
It would be very interesting to me to implement this.
Is it possible for you to port this into Cython and release it as PyPI package ? It would be easy for existing DL users(tf and pytorch users) to use it natively in their code.