bug-fix: just-in-time update in SVRG; feature: averaging for SVRG

krishnap25 commented 6 years ago

The just in time update for SVRG (formerly line 141 in svrg_fast.pyx) is incorrect when alpha > 0. In particular, the updates made in previous steps are scaled by a factor $(1 - \eta\alpha)$ in each step, which has not been accounted for. My patch fixes this bug by maintaining the dense updates separately and adding a correction by summing the geometric series $\sum_{t'<t} (1-\eta\alpha)^{t'}$ at inner iteration t.

Moreover, the patch also adds support for averaging inner iterates of SVRG, which has the same theoretical guarantees but exhibits better empirical performance. My implementation of averaging uses the sparse update scheme described by Bottou (2012).

I've tested the code by comparing against my trusted (dense) implementation of SVRG. The coefficients learnt my this patch match those of my implementation up to 1e-8.

mblondel commented 6 years ago

Thanks for the patch! It would be nice to add a test case that fails without the patch. What we've done in other modules like SAGA is to add to the tests a simple pure Python implementation (without any optimization trick) and check that this implementation and the optimized Cython implementation indeed return the same result.

krishnap25 commented 6 years ago

The current update is mathematically incorrect. The bug is hard to detect because it does not affect the loss or predictions by a noticeable amount. However, the weights show a small drift from the unoptimized (dense) implementation and this drift grows over time when the same random examples are chosen. I'll add a test for this in a few days.

mblondel commented 6 years ago

small drift from the unoptimized (dense) implementation

If you could add it to test_svrg.py and test the Cython implementation against it, that would be great.

scikit-learn-contrib / lightning

bug-fix: just-in-time update in SVRG; feature: averaging for SVRG #120