optuna / optuna-integration

Extended functionalities for Optuna in combination with third-party libraries.
https://optuna-integration.readthedocs.io/en/latest/index.html
MIT License
36 stars 30 forks source link

Add option to specify "catch" in OptunaSearchCV #162

Closed muhlbach closed 2 days ago

muhlbach commented 1 month ago

Expected behavior

I'm hitting this error: FloatingPointError: underflow encountered in _ndtri_exp_single (vectorized), when I'm fitting an OptunaSearchCV instance with a XGBoostRegressor, where regularization parameters are sampled too low (I think that's the issue). I would love to pass catch exception to the instance, but this is currently not possible the way the scikit-learn interface is desgined:

        if self.study is None:
            seed = random_state.randint(0, np.iinfo("int32").max)
            sampler = samplers.TPESampler(seed=seed)

            self.study_ = study_module.create_study(direction="maximize", sampler=sampler)

        else:
            self.study_ = self.study

        objective = _Objective(
            self.estimator,
            self.param_distributions,
            X_res,
            y_res,
            cv,
            self.enable_pruning,
            self.error_score,
            fit_params_res,
            groups_res,
            self.max_iter,
            self.return_train_score,
            self.scorer_,
        )

        _logger.info(
            "Searching the best hyperparameters using {} "
            "samples...".format(_num_samples(self.sample_indices_))
        )

        self.study_.optimize(
            objective,
            n_jobs=self.n_jobs,
            n_trials=self.n_trials,
            timeout=self.timeout,
            callbacks=self.callbacks,
                                                                        <------------ Could at "catch=self.catch" here
        )

Environment

Error messages, stack traces, or logs

11:29:22 [W 2024-09-06 11:29:16,762] Trial 10 failed with parameters: {} because of the following error: FloatingPointError('underflow encountered in _ndtri_exp_single (vectorized)').

Steps to reproduce

I cannot recreate the bug because of confidential data, but the gist of it is this:

import optuna
from optuna.distributions import FloatDistribution
from sklearn.datasets import make_regression
from xgboost import XGBRegressor

params = dict(reg_alpha=FloatDistribution(low=1e-10, high=1, log=True),
              reg_lambda=FloatDistribution(low=1e-10, high=1, log=True))
model = optuna.integration.OptunaSearchCV(estimator=XGBRegressor(), param_distributions=params)
X,y = make_regression(n_samples=100, n_features=10, noise=10000000)
model.fit(X, y)

Above code fails with using other data.

Additional context (optional)

No response

muhlbach commented 1 month ago

I have discovered that it is not the estimator that fails when calling .fit(), it is the sampler. I see two different patterns:

Pattern 1:

Traceback (most recent call last):
  File "E:\conda\envs\quant\Lib\site-packages\optuna\study\_optimize.py", line 196, in _run_trial
    value_or_values = func(trial)
                      ^^^^^^^^^^^
  File "E:\conda\envs\quant\Lib\site-packages\optuna_integration\sklearn\sklearn.py", line 214, in __call__
    params = self._get_params(trial)
             ^^^^^^^^^^^^^^^^^^^^^^^
  File "E:\conda\envs\quant\Lib\site-packages\optuna_integration\sklearn\sklearn.py", line 325, in _get_params
    name: trial._suggest(name, distribution)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "E:\conda\envs\quant\Lib\site-packages\optuna\trial\_trial.py", line 629, in _suggest
    param_value = self.study.sampler.sample_independent(
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "E:\conda\envs\quant\Lib\site-packages\optuna\samplers\_tpe\sampler.py", line 447, in sample_independent
    return self._sample(study, trial, {param_name: param_distribution})[param_name]
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "E:\conda\envs\quant\Lib\site-packages\optuna\samplers\_tpe\sampler.py", line 487, in _sample
    samples_below = mpe_below.sample(self._rng.rng, self._n_ei_candidates)
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "E:\conda\envs\quant\Lib\site-packages\optuna\samplers\_tpe\parzen_estimator.py", line 81, in sample
    sampled = self._mixture_distribution.sample(rng, size)
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "E:\conda\envs\quant\Lib\site-packages\optuna\samplers\_tpe\probability_distributions.py", line 65, in sample
    samples = _truncnorm.rvs(
              ^^^^^^^^^^^^^^^
  File "E:\conda\envs\quant\Lib\site-packages\optuna\samplers\_tpe\_truncnorm.py", line 215, in rvs
    return ppf(percentiles, a, b) * scale + loc
           ^^^^^^^^^^^^^^^^^^^^^^
  File "E:\conda\envs\quant\Lib\site-packages\optuna\samplers\_tpe\_truncnorm.py", line 194, in ppf
    out[case_left] = ppf_left(q_left, a[case_left], b[case_left])
                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "E:\conda\envs\quant\Lib\site-packages\optuna\samplers\_tpe\_truncnorm.py", line 182, in ppf_left
    return _ndtri_exp(log_Phi_x)
           ^^^^^^^^^^^^^^^^^^^^^
  File "E:\conda\envs\quant\Lib\site-packages\optuna\samplers\_tpe\_truncnorm.py", line 170, in _ndtri_exp
    return np.frompyfunc(_ndtri_exp_single, 1, 1)(y).astype(float)

Pattern 2:

Traceback (most recent call last):
  File "E:\conda\envs\quant\Lib\site-packages\optuna\study\_optimize.py", line 196, in _run_trial
    value_or_values = func(trial)
                      ^^^^^^^^^^^
  File "E:\conda\envs\quant\Lib\site-packages\optuna_integration\sklearn\sklearn.py", line 214, in __call__
    params = self._get_params(trial)
             ^^^^^^^^^^^^^^^^^^^^^^^
  File "E:\conda\envs\quant\Lib\site-packages\optuna_integration\sklearn\sklearn.py", line 325, in _get_params
    name: trial._suggest(name, distribution)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "E:\conda\envs\quant\Lib\site-packages\optuna\trial\_trial.py", line 629, in _suggest
    param_value = self.study.sampler.sample_independent(
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "E:\conda\envs\quant\Lib\site-packages\optuna\samplers\_tpe\sampler.py", line 447, in sample_independent
    return self._sample(study, trial, {param_name: param_distribution})[param_name]
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "E:\conda\envs\quant\Lib\site-packages\optuna\samplers\_tpe\sampler.py", line 488, in _sample
    acq_func_vals = self._compute_acquisition_func(samples_below, mpe_below, mpe_above)
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "E:\conda\envs\quant\Lib\site-packages\optuna\samplers\_tpe\sampler.py", line 528, in _compute_acquisition_func
    log_likelihoods_above = mpe_above.log_pdf(samples)
                            ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "E:\conda\envs\quant\Lib\site-packages\optuna\samplers\_tpe\parzen_estimator.py", line 86, in log_pdf
    return self._mixture_distribution.log_pdf(transformed_samples)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "E:\conda\envs\quant\Lib\site-packages\optuna\samplers\_tpe\probability_distributions.py", line 121, in log_pdf
    return np.log(np.exp(weighted_log_pdf - max_[:, None]).sum(axis=1)) + max_
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
FloatingPointError: underflow encountered in exp
nzw0301 commented 2 days ago

The issue is resolved by #163.