stanfordmlgroup / ngboost

Natural Gradient Boosting for Probabilistic Prediction
Apache License 2.0
1.63k stars 214 forks source link

Add Sklearn-like Early Stopping for NGBoostRegressor #263

Closed kmedved closed 3 years ago

kmedved commented 3 years ago

This is a pull request following up on the discussion here: https://github.com/stanfordmlgroup/ngboost/discussions/234, creating a new class called NGBESRegressor (NGB Early Stopping Regressor), which follows the HistGradientBoostingRegressor approach of automatically doing early stopping, rather than requiring the user to specify their own validation data.

This fits a bit better within the rest of the scikit-learn ecosystem, which mostly does not allow the user to pass validation data into .fit() calls, and thus creates problems for Hyperparameter tuning for instance.

This is a new class rather than a change to NGBRegressor directly in response to @tonyduan 's suggestion that the user still be able to pass their own validation data if they prefer. Creating a unified API which allowed for both proved difficult, so a separate class was needed. I did not do one for NGBClassifier or the NGBSurvival classes yet, although if there's interest this can be replicated across them.

Thank you to @darrendefreeuw for actually doing the underlying coding here.

ryan-wolbeck commented 3 years ago

I think we'll want to add this into the documentation or as an example in the examples folder as well

kmedved commented 3 years ago

The main problem we ran into with doing a unified API is that right now, NGBRegressor borrows the .fit() function from the parent NGBoost class rather than creating its own. It seemed like to maintain backwards compatibility within a unified class, we'd need to modify the parent NGBoost class's .fit() function to accommodate internal early stopping, or alternatively create a custom .fit() function for NGBRegressor to accommodate both uses. We were worried about messing that process up, since there's a bunch of stuff in the NGBoost .fit() function I don't really understand.

I'm open to ideas here however, as I agree it would be preferable to have a unified API however rather than sprawling classes.

alejandroschuler commented 3 years ago

The main problem we ran into with doing a unified API is that right now, NGBRegressor borrows the .fit() function from the parent NGBoost class rather than creating its own. It seemed like to maintain backwards compatibility within a unified class, we'd need to modify the parent NGBoost class's .fit() function to accommodate internal early stopping, or alternatively create a custom .fit() function for NGBRegressor to accommodate both uses. We were worried about messing that process up, since there's a bunch of stuff in the NGBoost .fit() function I don't really understand.

I'm open to ideas here however, as I agree it would be preferable to have a unified API however rather than sprawling classes.

Ah, I see what you mean. Yes, I think you'd have to modify the fit method from NGBoost. But a benefit of that is that it would also automatically extend functionality to the other classes if you extended their APIs. How about you give it a shot (don't worry about breaking stuff) and then I'll review it and make sure nothing got broken/fix things that broke?

kmedved commented 3 years ago

Closing this pull request and am going to open a new one with the updated approach, using the existing NGBoostRegressor API.