stanfordmlgroup / ngboost

Natural Gradient Boosting for Probabilistic Prediction
Apache License 2.0
1.63k stars 214 forks source link

Error when tuning NGBSurvival with GridSearchCV #270

Open rhjohnstone opened 3 years ago

rhjohnstone commented 3 years ago

Minimal example:

from ngboost import NGBSurvival
import numpy.random as npr
from sklearn.model_selection import GridSearchCV

X = npr.randn(100, 5)
T = npr.rand(100)
E = npr.randint(2, size=100)

param_grid = {"learning_rate": [0.01, 0.1]}

ngb = NGBSurvival(n_estimators=50)

clf = GridSearchCV(ngb, param_grid=param_grid, cv=3)

clf.fit(X, fit_params={"T": T, "E": E})

(I'm not actually sure about the last line, since a standard sklearn estimator just takes (X, y) when fitting, but the current error occurs before that anyway.)

This raises the error RuntimeError: Cannot clone object NGBSurvival(Dist=<class 'ngboost.distns.utils.SurvivalDistnClass<locals>.SurvivalDistn'>, n_estimators=50, random_state=RandomState(MT19937) at 0x7FA5683B1940), as the constructor either does not set or modifies parameter Dist

It does however work if I use NGBRegressor instead of NGBSurvival (and remove E).

Is there a way for me to fix this, or is this a problem with the NGBSurvival class? And if the latter, is it possible to fix?

alejandroschuler commented 2 years ago

Hmm, this is a problem with the way that NGBSurvival is implemented- specifically in the way that the abstractions for scores with and without survival data are designed. It's not an easy fix, unfortunately. I welcome suggestions, though. Related: https://github.com/stanfordmlgroup/ngboost/discussions/217