Closed reiinakano closed 7 years ago
This would be an awesome idea.
One idea that I have would be to do the following:
In this approach, it is definitely important to let the user quit the automation process while it is running.
How does this sound, @reiinakano ? Maybe this would be good for an initial implementation?
I haven't actually figured out the best way to let a user quit a process manually. Currently the only way to do that is to forcibly close the terminal running Xcessiv. One good thing about Xcessiv is that it stores meta-features of each base learner scored automatically, so it's actually quite fast to calculate the performance of one ensemble, since the only training you do is for the secondary estimator.
Anyway, I don't think it's necessary to maintain a rolling top-k list of ensembles. Instead I'd just store everything that was calculated in the database. You can easily sort with whatever metric you want anyway. This is currently what is done when you do Bayesian optimization for the base learners. The list of base learners just kind of auto-updates while the search is running.
I was thinking of doing something along the same lines for stacked ensembles. What I need is a smart algorithm or technique for selecting which base learners should be used and in what combinations. One way people do this is through a kind of greedy approach, iteratively trying out base learners to add and keeping it if the target metric rises. Of course, random combinations of base learners might actually be a good approach too, considering that it's better than grid search for optimizing base learners.
@reiinakano , that's also a good point. So, maybe go with a random approach for now, and add more later? Perhaps more developers would add their ideas to this issue and other issues as time goes on
I also think that a random approach might be better than grid search for now.
Agreed, I think it's important to settle on some kind of framework so that in the future, different exploration methods can be added very easily. I certainly intend on adding things other than Bayesian optimization for optimizing base learners in the future e.g. hyperband
Thanks for your inputs! Appreciate them a lot!
Added automated ensembling based on greedy forward model selection in #43 and is in v0.5.0
Working for a while with Xcessiv, I feel there's a need for some way to automate the selection of base learners in an ensemble. I'm unaware of existing techniques for this, so if anyone has any suggestions or could point me towards relevant literature, it would be greatly appreciated.