rodrigo-arenas / Sklearn-genetic-opt

ML hyperparameters tuning and features selection, using evolutionary algorithms.
https://sklearn-genetic-opt.readthedocs.io
MIT License
289 stars 73 forks source link

Can a vector of weights be specified in `param_grid` within GASearchCV (somehow)? #91

Closed sgbaird closed 2 years ago

sgbaird commented 2 years ago

The idea is to take in predictions from an arbitrary number of models, and find optimal weights that maximize the accuracy of the ensembled model.

Here's the estimator that I wrote:

from typing import List, Optional
import numpy as np
from sklearn.base import BaseEstimator, RegressorMixin
from sklearn.utils import check_X_y, check_array
from sklearn.utils.estimator_checks import check_estimator, check_is_fitted
from sklearn.metrics import mean_absolute_error

class WeightedAverageEnsemble(BaseEstimator, RegressorMixin):
    """

    >>> wae = WeightedAverageEnsemble()
    >>> X = np.random.rand(20, 5)
    >>> y = np.random.rand(20, 1)
    >>> wae.fit(X, y)
    >>> wae.predict(X)

    >>> wae = WeightedAverageEnsemble(weights=[0.25, 0.75])
    >>> X = np.random.rand(20, 2)
    >>> y = np.random.rand(20, 1)
    >>> wae.fit(X, y)
    >>> wae.predict(X)

    Parameters
    ----------
    BaseEstimator : _type_
        _description_
    RegressorMixin : _type_
        _description_
    """

    def __init__(self, weights: Optional[List[float]] = None):
        if weights is not None:
            assert np.isclose(sum(weights), 1.0)
        self.weights = weights

    def fit(self, X, y):
        # TODO: deal with sparse inputs (i.e. mask `W` and convert to sparse)
        X, y = check_X_y(X, y, accept_sparse=False)
        self.is_fitted_ = True
        self.n_features_in_ = X.shape[1]
        if self.weights is None:
            self._mod_weights = np.ones(self.n_features_in_) / self.n_features_in_
            # equivalent to:
            # w = np.ones(self.n_features_in_).reshape(1, -1)
            # w = sklearn.preprocessing.normalize(w, norm="l1", axis=1)
        else:
            self._mod_weights = self.weights
        return self

    def predict(self, X):
        # TODO: deal with sparse inputs (i.e. mask `W` and convert to sparse)
        X = check_array(X, accept_sparse=False)
        check_is_fitted(self, "is_fitted_")
        W = np.tile(self._mod_weights, (X.shape[0], 1))
        y = np.einsum("ij, ij->i", W, X)
        # should be equivalent to: y = np.sum(W * X)
        # loop with np.dot might also be fast due to BLAS compatibility
        # https://stackoverflow.com/a/26168677/13697228
        # https://stackoverflow.com/a/39657770/13697228
        return y

    def score(self, X, y, **kwargs):
        y_pred = self.predict(X)
        return mean_absolute_error(y, y_pred, **kwargs)

check_estimator(WeightedAverageEnsemble())

Related: https://machinelearningmastery.com/weighted-average-ensemble-with-python

How would you suggest optimizing weights since it's a vector that can change in size based on the size of the input data?

rodrigo-arenas commented 2 years ago

Hi @sgbaird, currently, the package only accepts the hyperparameters to be integers, floats, or categorical; in this case, as you have an array, it's not natively supported. One walkaround I can think of is that instead of taking the parameter weights as a vector, use the syntax **kwargs in your __init__ method to get an arbitrary number of extra parameters, and each of those parameters will represent the weights; your example would change to:

wae = WeightedAverageEnsemble(w1=0.25, w2=0.75)

This way, you can define the param grid as:

param_grid =  {'w1': Continuous(0.01, 0.99, distribution='log-uniform'),
               'w2': Continuous(0.01, 0.99, distribution='log-uniform')}

The main issue with this is that you can't guarantee that all the weights will add up to one, so a normalization might be required and that makes the optimization problem harder since even if you set w1 to a fixed number, the actual number used after normalization will be w1/(w1+w2) which is a function of w2 (or a multivariate function if you have more weights), and vice-versa when you normalize w2, so even if it can be optimized it will probably take a longer time to converge since its a little misleading to the algorithm.

I hope it makes sense.

rodrigo-arenas commented 2 years ago

I'm closing this issue, but feel free to raise more questions if needed

sgbaird commented 2 years ago

@rodrigo-arenas thank you! Good point about the normalization.