Open kmike opened 8 years ago
Non-contributor here:
What algorithms do you need? Do you have too much data (which can't be used through sparse-matrices)?
Thanks for the detailed overview!
I'm in reinforcement learning setup where the whole data is not available, and want to use a regression model which uses the data seen so far, without retraining it from scratch. I want to try an optimisation algorithm with an adaptive learning rate or a momentum, and lightning has a good AdaGradRegressor implementation.
Let's see what the developers think.
Just two random remarks:
Yeah, I'm using vanilla SGD now; it works ok. The problem is that the component should work across many tasks, and it'd be nice to have less parameters to tune.
I was just about to start an issue on this. I'm training models on a really big file, so the data won't fit in memory at once. Streaming and parallelization are the only way to use the data. Vanilla SGD from scikit-learn takes tuning and doesn't improve from multiple iterations. The FTRL from Kaggler.py works better, but can't be pickled.
I had a look at modifying scikit-lightning for this. The outputs2d initialization in fit() should be moved to init(), but also the Cython part should be modified so that it doesn't reset the model parameters when fit_partial is called. Would it be possible to get these changes?
Hi,
A patch that implements partial_fit would definitely be a nice addition !
Please submit a patch with the modifications that you propose. I'll allocate time to review them. On Jun 3, 2016 3:59 AM, "anttttti" notifications@github.com wrote:
I was just about to start an issue on this. I'm training models on a really big file, so the data won't fit in memory at once. Streaming and parallelization are the only way to use the data. Vanilla SGD from scikit-learn takes tuning and doesn't improve from multiple iterations. The FTRL from Kaggler.py works better, but can't be pickled.
I had a look at modifying scikit-lightning for this. The outputs2d initialization in fit() should be moved to init(), but also the Cython part should be modified so that it doesn't reset the model parameters when fit_partial is called. Would it be possible to get these changes?
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/scikit-learn-contrib/lightning/issues/78#issuecomment-223472407, or mute the thread https://github.com/notifications/unsubscribe/AAQ8h7Ax4lxMfP7mutD_qXyPhAOQfCCwks5qH4pqgaJpZM4Is9SR .
I didn't get a patch written, I hacked the code first to see how easily this could be done. I think I got it working for the AdaGradRegressor case, but the results were not good, so I think I missed something. The results from Adagrad without my hack weren't much better than SGD on my data, and FTRL from Kaggler was vastly better. This is a general result on SGD vs. FTRL with high-dimensional data. Anyway, I got a partial_fit FTRL working by adding model pickling to Kaggler instead. I could look at contributing to Lighting later.
Attached is the hack I wrote, in case someone wants to continue from that. adagrad.py.txt
partial_fit is already supported in scikit-learn's SGD so I think we should focus on AdaGrad first.
@anttttti If you start a PR, we can help you track down the problem. Also make sure to write a unit test that checks that calling partial_fit multiple times is equivalent to fit.
I made a version of FTRL available as part of the package I made available: https://github.com/anttttti/Wordbatch/blob/master/wordbatch/models/ftrl.pyx
This support partial fit and online learning, weighted features, link function for classification/regression, and does instance-level parallelization with OpenMP prange.
This script probably won't fit the scope of current sklearn-contrib-lightning, so I've released it independently for now.
Hey,
What does it take to implement partial_fit in lightning? Is there a reason it is not implemented?