Closed fabianp closed 9 years ago
It will take time for me to review this as I need to re-read the SAGA paper.
@adefazio We are planning to add SAGA to lightning. Your help reviewing this PR would be welcome.
I can offer some assistance if needed. I'm familiar with cython. If you have a working SAG implementation I would not expect any problems moving to SAGA. Please let me know if you have any specific questions.
Thanks @adefazio , this PR implements SAGA based on lightning's SAG implementation (I took inspiration from the cython code at the end of your paper).
I have one question tough. In order to evaluate the prox, I need to finallize
the weight vector, is that right? i.e. I need to apply all missing updates before I can perform the prox update.
CC @zermelozf who is interested by this PR
@fabianp Doing a finalize like you suggest is one way to do it, and I would recommend that for dense problems, where the jit updating is not really necessary anyway. If the user is supplying a custom prox function, you will probably need to do it that way. For sparse problems, you can include the prox update in the jit calculation in simple cases. For example, with a L1 prox, and with no L2 regulariser, you can just apply the multiple missed shrinkage updates in closed form as a single shrinkage just-in-time. It becomes more complex when you have both L1 and L2, probably requiring some clever coding.
@fabianp I'd suggest starting with the naive implementation, write unit tests (for both dense and sparse inputs) then implement the smart JIT trick.
@adefazio Does SAGA work well even without any L2 regularization?
SAGA/SAG can be used without L2 regularisation. The SDCA method requires L2 or a strongly convex regulariser of some sort.
superseded by PR #38
Here is a WIP of the SAGA implementation to get some feedback on the API and implementation.
The implementation is just a flag in the _sag_fit routine, I thought this was the simplest. Then I duplicated the SAGClassifier and SAGRegressor with a SAGA* counterpart.
The main thing TODO is the implementation of the proximal step (that's the advantage over SAG :-), but at least now the tests for SAG pass when a SAGA object is used instead.