The cross-validate example with hyperparameter tuning looks like an anti-pattern

I was looking at this example and the following block:

from sklearn import datasets
from sklearn.linear_model import Lasso
import skore

diabetes = datasets.load_diabetes()
X = diabetes.data[:150]
y = diabetes.target[:150]
lasso = Lasso()

for alpha in [0.5, 1, 2]:
    cv_results = skore.cross_validate(
        Lasso(alpha=alpha), X, y, cv=5, project=my_project
    )

The fact that we do an hyperparemeter search here looks really weird to me. I would request people to use a RandomizedSearchCV or a GridSearchCV so I'm not sure that the example is relevant anymore.

One issue that you end-up with the current example is that the generic cross_validate does not intend to track hyperparameters: it means as a user, I'll need to store those in the order of computation as well.

Another point is about data splitting: since no random state is set, each parameter are evaluated on really different sets. The SearchCV will report results on consistent splits even when the random state is not set (if I'm not mistaken).

A more natural example would be that I have a model where the hyperparameters are set and I get a fresh batch of data and I want to look if I got any drift in the statistical performance. It looks to me a more appealing and real use case than the current example. Edit: For this point, it means that somehow some feature engineering in the preprocessing stage get invalidating and that the craft model will start to not work anymore.

NB: I would put the import before from. I think this is something isort would do. NB2: lass = Lasso() is unused. NB3: you can load directly X and y with X, y = datasets.load_diabetes(return_X_y=True).

probabl-ai / skore

The cross-validate example with hyperparameter tuning looks like an anti-pattern #821