Closed sgbaird closed 2 years ago
Hi @sgbaird, currently, the package only accepts the hyperparameters to be integers, floats, or categorical; in this case, as you have an array, it's not natively supported.
One walkaround I can think of is that instead of taking the parameter weights as a vector, use the syntax **kwargs
in your __init__
method to get an arbitrary number of extra parameters, and each of those parameters will represent the weights; your example would change to:
wae = WeightedAverageEnsemble(w1=0.25, w2=0.75)
This way, you can define the param grid as:
param_grid = {'w1': Continuous(0.01, 0.99, distribution='log-uniform'),
'w2': Continuous(0.01, 0.99, distribution='log-uniform')}
The main issue with this is that you can't guarantee that all the weights will add up to one, so a normalization might be required and that makes the optimization problem harder since even if you set w1 to a fixed number, the actual number used after normalization will be w1/(w1+w2) which is a function of w2 (or a multivariate function if you have more weights), and vice-versa when you normalize w2, so even if it can be optimized it will probably take a longer time to converge since its a little misleading to the algorithm.
I hope it makes sense.
I'm closing this issue, but feel free to raise more questions if needed
@rodrigo-arenas thank you! Good point about the normalization.
The idea is to take in predictions from an arbitrary number of models, and find optimal weights that maximize the accuracy of the ensembled model.
Here's the estimator that I wrote:
Related: https://machinelearningmastery.com/weighted-average-ensemble-with-python
How would you suggest optimizing
weights
since it's a vector that can change in size based on the size of the input data?