mlpack / ensmallen

A header-only C++ library for numerical optimization --
http://ensmallen.org
Other
742 stars 120 forks source link

Addition of SLIDE algorithm #179

Closed sriramsk1999 closed 4 years ago

sriramsk1999 commented 4 years ago

[moved from https://github.com/mlpack/mlpack/issues/2290 as it's better suited here.]

I came across this article and found it quite intriguing. I went through the paper and the results seem promising.

It occurs to me that mlpack could make good use of this, since OpenMP is already being used and GPU support is not completely there yet, with bandicoot under development.

Would this be a worthwhile addition to mlpack? I'd like to know what all of you think about this.

Repo link - https://github.com/keroro824/HashingDeepLearning Paper link - https://arxiv.org/abs/1903.03129

sriramsk1999 commented 4 years ago

In addition, what might the final API look like? If I'm not wrong, currently an ensmallen optimizer is something like:

SGD<AdamUpdate> optimizer( 5e-2, 1, 0, 1e-5, false, AdamUpdate(1e-8, 0.9, 0.999)); where the underlying operation is backpropagation through matrix multiplication.

And if my understanding of the paper is correct, this algorithm will be replacing the matrix multiplication part of backpropagation with hashing. In essence, what I'd like to know is where the necessary additions would be made, to provide a foundation to build on (for myself or for anyone else who would like to take up this issue :) ).

zoq commented 4 years ago

SGD<AdamUpdate> optimizer( 5e-2, 1, 0, 1e-5, false, AdamUpdate(1e-8, 0.9, 0.999)); where the underlying operation is backpropagation through matrix multiplication.

Are you talking about SLIDE, or ensmallen?

sriramsk1999 commented 4 years ago

Hi @zoq , I am talking about ensmallen. Currently it optimizes using backpropagation through matrix multiplication correct?

zoq commented 4 years ago

No, ensmallen doesn't depend on backpropagation take a look at https://arxiv.org/abs/2003.04103 this should clarify the concepts of ensmallen.

sriramsk1999 commented 4 years ago

Thanks for the paper @zoq, it's quite interesting and cleared up my questions about the working of ensmallen.

I had another look at the SLIDE paper as well, and I noticed that I had missed something very important in my first pass. The optimization in traditional deep learning and in using SLIDE is the same.

What SLIDE brings to the table is a different method of training the network, training only a subset of the parameters instead of all of them. So my conclusion is this algorithm is actually suited to mlpack as an alternative to normal training (something like model.TrainSLIDE()?)

Let me know what you think!

mlpack-bot[bot] commented 4 years ago

This issue has been automatically marked as stale because it has not had any recent activity. It will be closed in 7 days if no further activity occurs. Thank you for your contributions! :+1: