GradientBoostingClassifier doesn't work with least squares loss

scikit-learn / scikit-learn

scikit-learn: machine learning in Python

https://scikit-learn.org

BSD 3-Clause "New" or "Revised" License

59.98k stars 25.38k forks source link

GradientBoostingClassifier doesn't work with least squares loss #1085

Closed larsmans closed 12 years ago

larsmans commented 12 years ago

Triggered by this SO question: GradientBoostingClassifier's docstring states that loss may be "ls", in which case least squares regression will be performed, but when you only try to do that, a ValueError is raised. I'm not sure if the code or the docs should be changed.

(I also noticed that Huber and quantile loss are not advertised in the regressor's docstring.)

pprett commented 12 years ago

Thanks for pointing this out - I've changed that in #1036 but I should fix this now. I'll remove "ls" from GradientBoostingClassifier.

amueller commented 12 years ago

@pprett this is fixed now, right?

pprett commented 12 years ago

@amueller correct, it has been adressed in #1088

smcinerney commented 11 years ago

In 0.13.1 the errorstring is still the seriously non-obvious "ValueError: n_classes must be 1 for regression" e.g. for GradientBoostingClassifier(loss='ls') or (loss='huber')

Could you change this to a more user-friendly "Loss function '%s' is not supported for classifier %s"?

amueller commented 11 years ago

Well 0.13.1 is the last release and won't change any more ;) Can you test the current development version please?

larsmans commented 11 years ago

Still the case in master. I'm on it.

larsmans commented 11 years ago

(Side note: ls is called squared_loss in SGD and mean_squared_error in metrics.)

larsmans commented 11 years ago

I think I'm going to need some help here. GradientBoostingClassifier docstring says:

loss : {'deviance'}, optional (default='deviance')
    loss function to be optimized. 'deviance' refers to
    deviance (= logistic regression) for classification
    with probabilistic outputs.

I don't know what this means. There is a loss option, but deviance is the only supported value? Then why is it there? @glouppe, @arjoly?

ogrisel commented 11 years ago

I think you should as @pprett too. The narrative doc has a bit more content: http://scikit-learn.org/dev/modules/ensemble.html#gradient-tree-boosting

pprett commented 11 years ago

@larsmans I created a PR #2308 that fixes the issue. GradientBoostingClassifier currently supports only one loss ('deviance') - internally it uses either Binomial or Multinomial Deviance depending on the number of class labels.