stanfordmlgroup / ngboost

Natural Gradient Boosting for Probabilistic Prediction
Apache License 2.0
1.62k stars 214 forks source link

The fit method implements online learning by default which is incompatible with sklearn API #314

Closed CompRhys closed 1 year ago

CompRhys commented 1 year ago

In order to do online learning in the sklearn API we need to implement a partial_fit method. I assumed that ngboost was correctly implementing the sklearn API and would not be carrying out online learning. This led to data leakage in our CV workflow where the same model is refit on different data sets which led us to incorrectly use this model in production.

The solution implemented here turns off the behaviour by default but doesn't enforce compliance with the sklearn API's setup to use partial_fit. This is because I assume that the online behaviour was intentional. If not intentional I would suggest adding a partial_fit method instead to avoid others being bitten by this issue.

CompRhys commented 1 year ago

@alejandroschuler @ryan-wolbeck

CompRhys commented 1 year ago

closing in favour or moving straight to partial_fit per email