Closed glemaitre closed 2 days ago
Hi @glemaitre, thanks for your message, agreed
Yeah I had noted that cross-validation historization often amounts to grid search, but could not find a more compelling example since it would not make sense to historicize several cross-validation from different estimators... So I thought that the historization of cross-validation feature is useful when users would write several cross-validations in a draft notebook to iterate, and we end up displaying them in a nicer display that amounts to a grid search (but good practice from the users would have been to do grid search from the start)
cc @MarieS-WiMLDS
I was looking at this example and the following block:
The fact that we do an hyperparemeter search here looks really weird to me. I would request people to use a
RandomizedSearchCV
or aGridSearchCV
so I'm not sure that the example is relevant anymore.One issue that you end-up with the current example is that the generic
cross_validate
does not intend to track hyperparameters: it means as a user, I'll need to store those in the order of computation as well.Another point is about data splitting: since no random state is set, each parameter are evaluated on really different sets. The
SearchCV
will report results on consistent splits even when the random state is not set (if I'm not mistaken).A more natural example would be that I have a model where the hyperparameters are set and I get a fresh batch of data and I want to look if I got any drift in the statistical performance. It looks to me a more appealing and real use case than the current example. Edit: For this point, it means that somehow some feature engineering in the preprocessing stage get invalidating and that the craft model will start to not work anymore.
NB: I would put the
import
beforefrom
. I think this is somethingisort
would do. NB2:lass = Lasso()
is unused. NB3: you can load directly X and y withX, y = datasets.load_diabetes(return_X_y=True)
.