OptunaSearchCV does not allow multiple fit calls if using a predefined study

fraimondo commented 4 months ago

Expected behavior

When CV is used to evaluate a model's performance, it requires fitting the same model several times with different training datasets. Like GridSearchCV, OptunaSearchCV should find the best set of hyperparameters on each fit call, independently from previous fit calls. In a nutshell, in scikit-learn, calling fit should overwrite what has been learned in the previous fit.

If we define a study and use it in the OptunaSearchCV object, each call to fit will still consider previously tested hyperparameters.

Running this code:

# %%
import optuna
from seaborn import load_dataset
from sklearn.svm import SVC
from optuna.distributions import FloatDistribution
from optuna_integration.sklearn import OptunaSearchCV

df = load_dataset("iris")
X = df.columns[:-1].tolist()
y = "species"

param_grid = {
    "C": FloatDistribution(1e-5, 1e5, log=True),
    "gamma": FloatDistribution(1e-5, 1e5, log=True),
}

study = optuna.create_study(
    direction="maximize",
    study_name="optuna-concept",
    load_if_exists=True,
)

model = OptunaSearchCV(SVC(), param_grid, study=study)

model.fit(df[X], df[y])

model.fit(df[X], df[y])

I can get this output:

I 2024-05-13 16:02:14,831] A new study created in memory with name: optuna-concept
<ipython-input-28-d1592faa5809>:23: ExperimentalWarning: OptunaSearchCV is experimental (supported from v0.17.0). The interface can change in the future.
  model = OptunaSearchCV(SVC(), param_grid, study=study)
[I 2024-05-13 16:02:14,848] Trial 0 finished with value: 0.9800000000000001 and parameters: {'C': 10351.297111965368, 'gamma': 1.39218971565227e-05}. Best is trial 0 with value: 0.9800000000000001.
[I 2024-05-13 16:02:14,859] Trial 1 finished with value: 0.9800000000000001 and parameters: {'C': 826.9937488516736, 'gamma': 0.00026811385313724336}. Best is trial 0 with value: 0.9800000000000001.
[I 2024-05-13 16:02:14,872] Trial 2 finished with value: 0.9533333333333334 and parameters: {'C': 0.0835974875787474, 'gamma': 0.3376313762794765}. Best is trial 0 with value: 0.9800000000000001.
[I 2024-05-13 16:02:14,885] Trial 3 finished with value: 0.5666666666666667 and parameters: {'C': 0.3719214417438912, 'gamma': 57.725990799744494}. Best is trial 0 with value: 0.9800000000000001.
[I 2024-05-13 16:02:14,898] Trial 4 finished with value: 0.9133333333333334 and parameters: {'C': 0.40370305464015327, 'gamma': 4.602726106992738e-05}. Best is trial 0 with value: 0.9800000000000001.
[I 2024-05-13 16:02:14,911] Trial 5 finished with value: 0.9133333333333334 and parameters: {'C': 28.759442955478697, 'gamma': 9.363212720161866e-05}. Best is trial 0 with value: 0.9800000000000001.
[I 2024-05-13 16:02:14,924] Trial 6 finished with value: 0.9133333333333334 and parameters: {'C': 0.3781237927668684, 'gamma': 0.00021739880763679305}. Best is trial 0 with value: 0.9800000000000001.
[I 2024-05-13 16:02:14,937] Trial 7 finished with value: 0.9133333333333334 and parameters: {'C': 1.112773446553936e-05, 'gamma': 0.0005445766246943653}. Best is trial 0 with value: 0.9800000000000001.
[I 2024-05-13 16:02:14,951] Trial 8 finished with value: 0.9133333333333334 and parameters: {'C': 0.0012455406065103847, 'gamma': 0.008568147302042823}. Best is trial 0 with value: 0.9800000000000001.
[I 2024-05-13 16:02:14,964] Trial 9 finished with value: 0.4 and parameters: {'C': 0.8468932999463333, 'gamma': 290.3961809928125}. Best is trial 0 with value: 0.9800000000000001.
[I 2024-05-13 16:02:14,983] Trial 10 finished with value: 0.3866666666666667 and parameters: {'C': 5298.946366765226, 'gamma': 60063.90222710521}. Best is trial 0 with value: 0.9800000000000001.
[I 2024-05-13 16:02:14,999] Trial 11 finished with value: 0.9800000000000001 and parameters: {'C': 93266.46580171691, 'gamma': 1.0715460717911853e-05}. Best is trial 0 with value: 0.9800000000000001.
[I 2024-05-13 16:02:15,014] Trial 12 finished with value: 0.9733333333333334 and parameters: {'C': 312.5328055829849, 'gamma': 0.017094462245679686}. Best is trial 0 with value: 0.9800000000000001.
[I 2024-05-13 16:02:15,028] Trial 13 finished with value: 0.9800000000000001 and parameters: {'C': 120.30973736078418, 'gamma': 0.007537251254100954}. Best is trial 0 with value: 0.9800000000000001.
[I 2024-05-13 16:02:15,044] Trial 14 finished with value: 0.9400000000000001 and parameters: {'C': 89802.95921412794, 'gamma': 0.338960773312636}. Best is trial 0 with value: 0.9800000000000001.
[I 2024-05-13 16:02:15,059] Trial 15 finished with value: 0.9666666666666666 and parameters: {'C': 2864.608345614222, 'gamma': 0.002451222090379233}. Best is trial 0 with value: 0.9800000000000001.
[I 2024-05-13 16:02:15,077] Trial 16 finished with value: 0.9133333333333334 and parameters: {'C': 13.797511376022928, 'gamma': 1.1807948047499311e-05}. Best is trial 0 with value: 0.9800000000000001.
[I 2024-05-13 16:02:15,095] Trial 17 finished with value: 0.8066666666666666 and parameters: {'C': 3197.336269580921, 'gamma': 31.000214484709442}. Best is trial 0 with value: 0.9800000000000001.
[I 2024-05-13 16:02:15,110] Trial 18 finished with value: 0.96 and parameters: {'C': 593.7485312233997, 'gamma': 0.11847410370189265}. Best is trial 0 with value: 0.9800000000000001.
[I 2024-05-13 16:02:15,126] Trial 19 finished with value: 0.9400000000000001 and parameters: {'C': 14.737156301600974, 'gamma': 0.0009099409228744786}. Best is trial 0 with value: 0.9800000000000001.

We can see that after the first 10 trials, when the fit method is called again, we still consider trial 0 as the best.

However, this is not the case when the study parameter in the OptunaSearchCV is left None:

[I 2024-05-13 16:05:09,848] A new study created in memory with name: optuna-concept
<ipython-input-35-e43fc9e488b2>:23: ExperimentalWarning: OptunaSearchCV is experimental (supported from v0.17.0). The interface can change in the future.
  model = OptunaSearchCV(SVC(), param_grid)
[I 2024-05-13 16:05:09,849] A new study created in memory with name: no-name-b1d68c73-b5a2-4c4f-9228-df711f7ace1a
[I 2024-05-13 16:05:09,864] Trial 0 finished with value: 0.9133333333333334 and parameters: {'C': 0.4439486355840779, 'gamma': 7.00409627441481e-05}. Best is trial 0 with value: 0.9133333333333334.
[I 2024-05-13 16:05:09,877] Trial 1 finished with value: 0.9133333333333334 and parameters: {'C': 0.5303455913113039, 'gamma': 0.0001703896869247438}. Best is trial 0 with value: 0.9133333333333334.
[I 2024-05-13 16:05:09,892] Trial 2 finished with value: 0.4 and parameters: {'C': 2.0132370607434393e-05, 'gamma': 404.0509717323509}. Best is trial 0 with value: 0.9133333333333334.
[I 2024-05-13 16:05:09,907] Trial 3 finished with value: 0.9400000000000001 and parameters: {'C': 0.28597605134121973, 'gamma': 6.019038233001054}. Best is trial 3 with value: 0.9400000000000001.
[I 2024-05-13 16:05:09,920] Trial 4 finished with value: 0.9666666666666666 and parameters: {'C': 55168.73363242244, 'gamma': 0.0014765442445975774}. Best is trial 4 with value: 0.9666666666666666.
[I 2024-05-13 16:05:09,931] Trial 5 finished with value: 0.9666666666666666 and parameters: {'C': 10812.193340743012, 'gamma': 0.0031446980766459175}. Best is trial 4 with value: 0.9666666666666666.
[I 2024-05-13 16:05:09,943] Trial 6 finished with value: 0.9533333333333334 and parameters: {'C': 537.8379154883958, 'gamma': 6.199248965067256}. Best is trial 4 with value: 0.9666666666666666.
[I 2024-05-13 16:05:09,955] Trial 7 finished with value: 0.96 and parameters: {'C': 438.47066778932987, 'gamma': 0.06183660430369448}. Best is trial 4 with value: 0.9666666666666666.
[I 2024-05-13 16:05:09,969] Trial 8 finished with value: 0.7533333333333334 and parameters: {'C': 0.0012043389646025568, 'gamma': 11.766592943333281}. Best is trial 4 with value: 0.9666666666666666.
[I 2024-05-13 16:05:09,983] Trial 9 finished with value: 0.7533333333333334 and parameters: {'C': 0.005564309450177441, 'gamma': 11.976763945503981}. Best is trial 4 with value: 0.9666666666666666.
[I 2024-05-13 16:05:09,985] A new study created in memory with name: no-name-94db3ef0-0c84-44df-9f8a-08f21dedb2a6
[I 2024-05-13 16:05:09,999] Trial 0 finished with value: 0.3866666666666667 and parameters: {'C': 1008.3860333082104, 'gamma': 46755.59214534676}. Best is trial 0 with value: 0.3866666666666667.
[I 2024-05-13 16:05:10,012] Trial 1 finished with value: 0.62 and parameters: {'C': 5.787859846450096e-05, 'gamma': 12901.906537289571}. Best is trial 1 with value: 0.62.
[I 2024-05-13 16:05:10,025] Trial 2 finished with value: 0.9133333333333334 and parameters: {'C': 7.735518147192777e-05, 'gamma': 0.01786697588872832}. Best is trial 2 with value: 0.9133333333333334.
[I 2024-05-13 16:05:10,035] Trial 3 finished with value: 0.96 and parameters: {'C': 16834.94728694924, 'gamma': 0.0015249518912179214}. Best is trial 3 with value: 0.96.
[I 2024-05-13 16:05:10,047] Trial 4 finished with value: 0.5599999999999999 and parameters: {'C': 0.6581758394630691, 'gamma': 19858.830825536214}. Best is trial 3 with value: 0.96.
[I 2024-05-13 16:05:10,060] Trial 5 finished with value: 0.54 and parameters: {'C': 0.37449196875323615, 'gamma': 35339.25851562329}. Best is trial 3 with value: 0.96.
[I 2024-05-13 16:05:10,075] Trial 6 finished with value: 0.7266666666666667 and parameters: {'C': 1280.6078858904702, 'gamma': 56.95113251624784}. Best is trial 3 with value: 0.96.
[I 2024-05-13 16:05:10,087] Trial 7 finished with value: 0.9333333333333333 and parameters: {'C': 0.00021519041264088718, 'gamma': 0.23772404640237857}. Best is trial 3 with value: 0.96.
[I 2024-05-13 16:05:10,101] Trial 8 finished with value: 0.3866666666666667 and parameters: {'C': 261.883070644005, 'gamma': 63693.16623748126}. Best is trial 3 with value: 0.96.
[I 2024-05-13 16:05:10,114] Trial 9 finished with value: 0.4 and parameters: {'C': 0.06239537430988799, 'gamma': 1013.7180513752583}. Best is trial 3 with value: 0.96.

Environment

Optuna version:3.6.0
Optuna Integration version:3.6.0
Python version:3.11.6
OS:macOS-14.4.1-arm64-arm-64bit
Scikit-learn: 1.4.1.post1

Error messages, stack traces, or logs

Pasted in the description

Steps to reproduce

Install optuna, scikit-learn and seaborn

Run:


import optuna
from seaborn import load_dataset
from sklearn.svm import SVC
from optuna.distributions import FloatDistribution
from optuna_integration.sklearn import OptunaSearchCV

df = load_dataset("iris") X = df.columns[:-1].tolist() y = "species"

param_grid = { "C": FloatDistribution(1e-5, 1e5, log=True), "gamma": FloatDistribution(1e-5, 1e5, log=True), }

study = optuna.create_study( direction="maximize", study_name="optuna-concept", load_if_exists=True, )

model = OptunaSearchCV(SVC(), param_grid, study=study)

model.fit(df[X], df[y])

3. Then run 
```python
import optuna
from seaborn import load_dataset
from sklearn.svm import SVC
from optuna.distributions import FloatDistribution
from optuna_integration.sklearn import OptunaSearchCV

df = load_dataset("iris")
X = df.columns[:-1].tolist()
y = "species"

param_grid = {
    "C": FloatDistribution(1e-5, 1e5, log=True),
    "gamma": FloatDistribution(1e-5, 1e5, log=True),
}

study = optuna.create_study(
    direction="maximize",
    study_name="optuna-concept",
    load_if_exists=True,
)

model = OptunaSearchCV(SVC(), param_grid)

model.fit(df[X], df[y])

model.fit(df[X], df[y])

Additional context (optional)

No response

nzw0301 commented 4 months ago

Hmm, so you think this class should behave the same as sklearn's GridSearchCV. As you described, we have a simple workaround to do so and I'm not sure this class should have exact same behaviour as the sklearn's one because it depends on study and the current behaviour is consistent with the optuna's optimize method.

fraimondo commented 4 months ago

I understand that optuna's learning process allows for incremental data input. However, this changes completely the semantics of scikit-learn's fit method, to the point that it is not suitable (and even wrong) in the context of scikit-learn's model evaluation procedures.

As an example, think of a call to scikit-learn's cross_validate function where the cv parameter is a 2-fold CV scheme and the estimator is an OptunaSearchCV object. Ideally, we should obtain two performance estimates, each trained on 50% of the data and tested on the other 50%. With the current implementation of OptunaSearchCV, the second time that the fit method is called, it would have learnt from 100% of the data, including the test-sample. This is a test-to-train data leakage.

The solution is quite simple. On every call to fit, a new study should be created, copying the sampler/prunner/config of the study used on the constructor. As it stands right now, the only proper way to use this class in the context of scikit-learn is to leave the study parameter at None, which does not allow to specify sampler/prunners/n_trials/etc.

nzw0301 commented 4 months ago

Thank you for clarification.

Alternatively, passing a new study to OptunaSearchCV will be solution too, where we can specify sampler/pruner, etc, even it though it is not compatible with sklearn's semantic.

nzw0301 commented 4 months ago

My concern on your suggestion is the storage. I suppose the approach works only with the default storage: in memory, because a study instance has storage info. So another rule or argument is necessary to create a new study when calling fit method.

fraimondo commented 4 months ago

Thank you for clarification.

Alternatively, passing a new study to OptunaSearchCV will be solution too, where we can specify sampler/pruner, etc, even it though it is not compatible with sklearn's semantic.

This has exactly the issue I described before. Passing a study to the OptunaSearchCV object makes it incorrect within scikit-learn's integration, which I think is exactly the point of having an OptunaSearchCV class (i.e. to integrate with scikit-learn)

Everything can be solved easily, including the storage issue you mentioned before. Basically, instead of using the study specified in the constructor of OptunaSearchCV, use the same parameters but change the study_name adding a suffix that specifies the current fit call. This can be done by changing the current code:

https://github.com/optuna/optuna-integration/blob/15e6b0ec6d9a0d7f572ad387be8478c56257bef7/optuna_integration/sklearn/sklearn.py#L886-L893

To this:

        else:
            prefix_name = self.study.study_name
            i_fit = 0
            for t_study in self.study._storage.get_all_studies():
                if re.fullmatch(f"{prefix_name}_fit[0-9]+", t_study.study_name) is not None:
                    i_fit += 1

            self.study_ = study_module.create_study(
                direction="maximize",
                sampler=self.study.sampler,
                pruner=self.study.pruner,
                study_name=f"{prefix_name}_fit{i_fit}",
                storage=self.study._storage,
                load_if_exists=False,
            )

This creates one entry in the storage each time the fit method is called. It also allows to inspect using the optuna dashboard and check if the CVs are somehow reaching a plateau, thus optimizing well, or maybe the study needs to be parametrised better:

optuna / optuna-integration