Closed sriramsk1999 closed 4 years ago
In addition, what might the final API look like? If I'm not wrong, currently an ensmallen optimizer is something like:
SGD<AdamUpdate> optimizer( 5e-2, 1, 0, 1e-5, false, AdamUpdate(1e-8, 0.9, 0.999));
where the underlying operation is backpropagation through matrix multiplication.
And if my understanding of the paper is correct, this algorithm will be replacing the matrix multiplication part of backpropagation with hashing. In essence, what I'd like to know is where the necessary additions would be made, to provide a foundation to build on (for myself or for anyone else who would like to take up this issue :) ).
SGD<AdamUpdate> optimizer( 5e-2, 1, 0, 1e-5, false, AdamUpdate(1e-8, 0.9, 0.999));
where the underlying operation is backpropagation through matrix multiplication.
Are you talking about SLIDE, or ensmallen?
Hi @zoq , I am talking about ensmallen. Currently it optimizes using backpropagation through matrix multiplication
correct?
No, ensmallen doesn't depend on backpropagation take a look at https://arxiv.org/abs/2003.04103 this should clarify the concepts of ensmallen.
Thanks for the paper @zoq, it's quite interesting and cleared up my questions about the working of ensmallen.
I had another look at the SLIDE paper as well, and I noticed that I had missed something very important in my first pass. The optimization in traditional deep learning and in using SLIDE is the same.
What SLIDE brings to the table is a different method of training the network, training only a subset of the parameters instead of all of them. So my conclusion is this algorithm is actually suited to mlpack
as an alternative to normal training (something like model.TrainSLIDE()
?)
Let me know what you think!
This issue has been automatically marked as stale because it has not had any recent activity. It will be closed in 7 days if no further activity occurs. Thank you for your contributions! :+1:
[moved from https://github.com/mlpack/mlpack/issues/2290 as it's better suited here.]
I came across this article and found it quite intriguing. I went through the paper and the results seem promising.
It occurs to me that mlpack could make good use of this, since OpenMP is already being used and GPU support is not completely there yet, with bandicoot under development.
Would this be a worthwhile addition to mlpack? I'd like to know what all of you think about this.
Repo link - https://github.com/keroro824/HashingDeepLearning Paper link - https://arxiv.org/abs/1903.03129