Robust irls for regression

TimotheeMathieu commented 2 years ago

New optimization scheme for RobustWeightedRegressor. Use IRLS (iterative reweighted least squares). Faster in low dimension than the SGD counterpart.

More extensive experiments would be needed to compare IRLS solver to SGD solver, but on simple examples IRLS seems to perform rather well.

TimotheeMathieu commented 2 years ago

The IRLS algorithm is a bit implicit I agree. The principle is that we do least squares at each iteration (cf line 392) and this least squares is reweighted using the custom robust weights:

for epoch in range(self.max_iter):
    weights = ...
    base_estimator.fit(X, y, sample_weight=weights)

So this is exactly an IRLS algorithm.

rth commented 2 years ago

I see your point. I just find calling this option solver a bit confusing, as far as I understand that would mixe two different things between the inner solver and this outer iterative approach? We are iteratively reweighting an estimator with a least square loss in any case, and in that sense one can argue that it's IRLS also for SGDClassifier, can't we?

Wouldn't it be clearer to accept base_estimator as parameter with a default to SDG? Also I really think we should use Ridge instead of LinearRegression there. Also to be consistent with SGDClassifier, otherwise we are not optimizing the same loss, and the solver shouldn't impact on the loss. This would also remove the need for sgd_kwargs parameters. In particular, currently if one would want to do a grid search on the optimal regularization this would not be obvious with the solver parameter.

So ideally, RobustWeightedRegressor().get_params() should also display hyperparameters of the inner estimator, which is a pre-requisite for using GridSearchCV. And it should if it's a meta-estimator, I think (we would need to check how it's done in scikit-learn meta-estimators, I don't remember)

TimotheeMathieu commented 2 years ago

I see your point. I just find calling this option solver a bit confusing, as far as I understand that would mixe two different things between the inner solver and this outer iterative approach? We are iteratively reweighting an estimator with a least square loss in any case, and in that sense one can argue that it's IRLS also for SGDClassifier, can't we?

Yes you are right, although we can use alternative losses for SGD. Maybe I can replace "solver" by "iterative_scheme" or something like that, it could take as input 'SGD' or 'OLS' (or 'Ridge') but the algorithm is a little different for the two approaches because SGD you do only one iteration in each loop while for least squares, we do the whole optimization at each step. Maybe I can use a base_estimator, and then I can check, if max_iter is in the parameters I set it to 1 otherwise I do the whole optimization at each (outer) step.

Wouldn't it be clearer to accept base_estimator as parameter with a default to SDG? Also I really think we should use Ridge instead of LinearRegression there. Also to be consistent with SGDClassifier, otherwise we are not optimizing the same loss, and the solver shouldn't impact on the loss. This would also remove the need for sgd_kwargs parameters. In particular, currently if one would want to do a grid search on the optimal regularization this would not be obvious with the solver parameter.

Yes it is doable but I really want to keep the SGD one iteration per loop approach because it helps with stability. If you do the whole optimization at each iteration then there can be instability because in the first (outer) iterations, the weights are not robust yet. Ok for Ridge, but I don't understand why you say that this would remove the need for sgd_kwargs ?

So ideally, RobustWeightedRegressor().get_params() should also display hyperparameters of the inner estimator, which is a pre-requisite for using GridSearchCV. And it should if it's a meta-estimator, I think (we would need to check how it's done in scikit-learn meta-estimators, I don't remember)

I will check.

rth commented 2 years ago

Ok for Ridge, but I don't understand why you say that this would remove the need for sgd_kwargs ?

It wouldn't. I mean if we go with supporting multiple base estimators, we probably should rename that to something that's independent from SGD (i.e. estimator_kwargs instead of ridge_kwarg and sgd_kwargs), or ideally something that would work with get_params/set_params and would be compatible with GridSearchCV.

scikit-learn-contrib / scikit-learn-extra

Robust irls for regression #130