scikit-learn-contrib / lightning

Large-scale linear classification, regression and ranking in Python
https://contrib.scikit-learn.org/lightning/
1.73k stars 214 forks source link

[MRG] Sample weights in SAGA #73

Closed fabianp closed 8 years ago

fabianp commented 8 years ago

Second implementation, as discussed in #68

I made some benchmarks to compare this (strategy 2) against the unweighted version (strategy 1):

image image image

The first column is a sanity check to make sure that they are indeed the same in terms of number of iterations (as it should be). The second column represent the convergence with respect to time. My interpretation is that the overhead is negligible and that any difference that we might be seeing is probably dominated by noise in the measurements (it doesnt make much sense that the weighted version is faster as it seems to be the case in the first dataset.

mblondel commented 8 years ago

Thanks a lot for the investigation. I prefer this design, if this is ok with you too.

fabianp commented 8 years ago

Yep, I also prefer it.

fabianp commented 8 years ago

Fixed the tests and added an example, should be good to go.

fabianp commented 8 years ago

hey @mblondel , green light to merge this?

mblondel commented 8 years ago

Yes! Thanks!