[WIP] Initial implementation of SAGA

scikit-learn-contrib / lightning

Large-scale linear classification, regression and ranking in Python

https://contrib.scikit-learn.org/lightning/

1.73k stars 214 forks source link

[WIP] Initial implementation of SAGA #37

Closed fabianp closed 9 years ago

fabianp commented 9 years ago

Here is a WIP of the SAGA implementation to get some feedback on the API and implementation.

The implementation is just a flag in the _sag_fit routine, I thought this was the simplest. Then I duplicated the SAGClassifier and SAGRegressor with a SAGA* counterpart.

The main thing TODO is the implementation of the proximal step (that's the advantage over SAG :-), but at least now the tests for SAG pass when a SAGA object is used instead.

mblondel commented 9 years ago

It will take time for me to review this as I need to re-read the SAGA paper.

@adefazio We are planning to add SAGA to lightning. Your help reviewing this PR would be welcome.

adefazio commented 9 years ago

I can offer some assistance if needed. I'm familiar with cython. If you have a working SAG implementation I would not expect any problems moving to SAGA. Please let me know if you have any specific questions.

fabianp commented 9 years ago

Thanks @adefazio , this PR implements SAGA based on lightning's SAG implementation (I took inspiration from the cython code at the end of your paper).

I have one question tough. In order to evaluate the prox, I need to finallize the weight vector, is that right? i.e. I need to apply all missing updates before I can perform the prox update.

fabianp commented 9 years ago

CC @zermelozf who is interested by this PR

adefazio commented 9 years ago

@fabianp Doing a finalize like you suggest is one way to do it, and I would recommend that for dense problems, where the jit updating is not really necessary anyway. If the user is supplying a custom prox function, you will probably need to do it that way. For sparse problems, you can include the prox update in the jit calculation in simple cases. For example, with a L1 prox, and with no L2 regulariser, you can just apply the multiple missed shrinkage updates in closed form as a single shrinkage just-in-time. It becomes more complex when you have both L1 and L2, probably requiring some clever coding.

mblondel commented 9 years ago

@fabianp I'd suggest starting with the naive implementation, write unit tests (for both dense and sparse inputs) then implement the smart JIT trick.

@adefazio Does SAGA work well even without any L2 regularization?

adefazio commented 9 years ago

SAGA/SAG can be used without L2 regularisation. The SDCA method requires L2 or a strongly convex regulariser of some sort.

fabianp commented 9 years ago

superseded by PR #38