online learning - Githubissues

kmike commented 8 years ago

Hey,

What does it take to implement partial_fit in lightning? Is there a reason it is not implemented?

sschnug commented 8 years ago

Non-contributor here:

What algorithms do you need? Do you have too much data (which can't be used through sparse-matrices)?

I do think, that the following algorithms need only minimal changes:

SGDClassifier, SGDRegressor (already available in scikit-learn with partial_fit; only slightly different)
AdaGradClassifier, AdaGradRegressor (slightly more work depending on internals)
SAGClassifier, SAGRegressor (slightly more work depending on internals)

Impossible (algorithm-wise; batch-methods = full-gradient):

FistaClassifier, FistaRegressor
SVRGClassifier, SVRGRegressor

These could maybe work, but i'm unsure about the theory (there might be constraints on partial_fit; how to call it with which data):

CDClassifier, CDRegressor
SDCAClassifier, SDCARegressor

kmike commented 8 years ago

Thanks for the detailed overview!

I'm in reinforcement learning setup where the whole data is not available, and want to use a regression model which uses the data seen so far, without retraining it from scratch. I want to try an optimisation algorithm with an adaptive learning rate or a momentum, and lightning has a good AdaGradRegressor implementation.

sschnug commented 8 years ago

Let's see what the developers think.

Just two random remarks:

Did you try (carefully tuned) vanilla-SGD (the version in sklearn with partial_fit) for your use-case (i'm sceptical if AdaGrad is so much better, but this might be dependent on your data and i'm not an expert)
There is a warm_start option in CDClassifier and SDCAClassifier... Maybe there is a clever way incorporate these possibilities in your setup

kmike commented 8 years ago

Yeah, I'm using vanilla SGD now; it works ok. The problem is that the component should work across many tasks, and it'd be nice to have less parameters to tune.

anttttti commented 8 years ago

I was just about to start an issue on this. I'm training models on a really big file, so the data won't fit in memory at once. Streaming and parallelization are the only way to use the data. Vanilla SGD from scikit-learn takes tuning and doesn't improve from multiple iterations. The FTRL from Kaggler.py works better, but can't be pickled.

I had a look at modifying scikit-lightning for this. The outputs2d initialization in fit() should be moved to init(), but also the Cython part should be modified so that it doesn't reset the model parameters when fit_partial is called. Would it be possible to get these changes?

fabianp commented 8 years ago

Hi,

A patch that implements partial_fit would definitely be a nice addition !

Please submit a patch with the modifications that you propose. I'll allocate time to review them. On Jun 3, 2016 3:59 AM, "anttttti" notifications@github.com wrote:

I was just about to start an issue on this. I'm training models on a really big file, so the data won't fit in memory at once. Streaming and parallelization are the only way to use the data. Vanilla SGD from scikit-learn takes tuning and doesn't improve from multiple iterations. The FTRL from Kaggler.py works better, but can't be pickled.

I had a look at modifying scikit-lightning for this. The outputs2d initialization in fit() should be moved to init(), but also the Cython part should be modified so that it doesn't reset the model parameters when fit_partial is called. Would it be possible to get these changes?

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/scikit-learn-contrib/lightning/issues/78#issuecomment-223472407, or mute the thread https://github.com/notifications/unsubscribe/AAQ8h7Ax4lxMfP7mutD_qXyPhAOQfCCwks5qH4pqgaJpZM4Is9SR .

anttttti commented 8 years ago

I didn't get a patch written, I hacked the code first to see how easily this could be done. I think I got it working for the AdaGradRegressor case, but the results were not good, so I think I missed something. The results from Adagrad without my hack weren't much better than SGD on my data, and FTRL from Kaggler was vastly better. This is a general result on SGD vs. FTRL with high-dimensional data. Anyway, I got a partial_fit FTRL working by adding model pickling to Kaggler instead. I could look at contributing to Lighting later.

Attached is the hack I wrote, in case someone wants to continue from that. adagrad.py.txt

mblondel commented 8 years ago

partial_fit is already supported in scikit-learn's SGD so I think we should focus on AdaGrad first.

@anttttti If you start a PR, we can help you track down the problem. Also make sure to write a unit test that checks that calling partial_fit multiple times is equivalent to fit.

anttttti commented 8 years ago

I made a version of FTRL available as part of the package I made available: https://github.com/anttttti/Wordbatch/blob/master/wordbatch/models/ftrl.pyx

This support partial fit and online learning, weighted features, link function for classification/regression, and does instance-level parallelization with OpenMP prange.

This script probably won't fit the scope of current sklearn-contrib-lightning, so I've released it independently for now.

scikit-learn-contrib / lightning

online learning #78

I do think, that the following algorithms need only minimal changes:

Impossible (algorithm-wise; batch-methods = full-gradient):

These could maybe work, but i'm unsure about the theory (there might be constraints on partial_fit; how to call it with which data):