In order to do online learning in the sklearn API we need to implement a partial_fit method. I assumed that ngboost was correctly implementing the sklearn API and would not be carrying out online learning. This led to data leakage in our CV workflow where the same model is refit on different data sets which led us to incorrectly use this model in production.
The solution implemented here turns off the behaviour by default but doesn't enforce compliance with the sklearn API's setup to use partial_fit. This is because I assume that the online behaviour was intentional. If not intentional I would suggest adding a partial_fit method instead to avoid others being bitten by this issue.
In order to do online learning in the sklearn API we need to implement a
partial_fit
method. I assumed thatngboost
was correctly implementing the sklearn API and would not be carrying out online learning. This led to data leakage in our CV workflow where the same model is refit on different data sets which led us to incorrectly use this model in production.The solution implemented here turns off the behaviour by default but doesn't enforce compliance with the sklearn API's setup to use
partial_fit
. This is because I assume that the online behaviour was intentional. If not intentional I would suggest adding apartial_fit
method instead to avoid others being bitten by this issue.