titu1994 / pyshac

A Python library for the Sequential Halving and Classification algorithm
http://titu1994.github.io/pyshac/
MIT License
21 stars 7 forks source link

Out of range value for min_split_loss, value='0' #8

Closed KOLANICH closed 6 years ago

KOLANICH commented 6 years ago
    best=self.invokeScoring(blackBoxIteration, pb, context)
  File "<censored>\nick\projects\uniopt\UniOpt\backends\pyshac.py", line 107, in invokeScoring
    shac.fit(pyshacScore, skip_cv_checks=self.skipCV, early_stop=self.earlyStop, relax_checks=self.relaxChecks)
  File "<censored>\Anaconda3\lib\site-packages\pyshac\core\engine.py", line 1201, in fit
    callbacks=callbacks)
  File "<censored>\Anaconda3\lib\site-packages\pyshac\core\engine.py", line 279, in fit
    model = self._train_classifier(x, y, num_splits=num_splits)
  File "<censored>\Anaconda3\lib\site-packages\pyshac\core\engine.py", line 862, in _train_classifier
    n_jobs=self.num_workers)
  File "<censored>\Anaconda3\lib\site-packages\pyshac\utils\xgb_utils.py", line 46, in train_single_model
    scores = cross_val_score(model, encoded_samples, labels, cv=kfold, n_jobs=1)
  File "<censored>\Anaconda3\lib\site-packages\sklearn\model_selection\_validation.py", line 402, in cross_val_score
    error_score=error_score)
.....
  File "<censored>\Anaconda3\lib\site-packages\xgboost\core.py", line 165, in _check_call
    raise XGBoostError(_LIB.XGBGetLastError())
xgboost.core.XGBoostError: b"Out of range value for min_split_loss, value='0'"

I have searched for the name in the repo and traced the hyperparams setting (by patching the relevant function in xgboost), the hyperparam is really never set.

{'base_score': 0.5, 'booster': 'gbtree', 'colsample_bylevel': 1, 'colsample_bytree': 1, 'gamma': 0, 'learning_rate': 0.1, 'max_delta_step': 0, 'max_depth': 3, 'min_child_weight': 1, 'missing': None, 'n_estimators': 200, 'nthread': 10, 'objective': 'binary:logistic', 'reg_alpha': 0, 'reg_lambda': 1, 'scale_pos_weight': 1, 'seed': 0, 'silent': 1, 'subsample':1}
titu1994 commented 6 years ago

What version of XGBoost are you using ? This is a default parameter used by the XGBoost model that shac builds. You can try setting skip_cv_checks=True and see if the issue goes away, or upgrade XGBoost to the latest version.

KOLANICH commented 6 years ago

What version of XGBoost are you using ?

The one from Git recently built.

titu1994 commented 6 years ago

Hmm. I am using XGBoost v0.8 but I don't think that should be an issue.

Could you try using skip CV checks ? I am not encountering this issue.

Could you describe the search space?

KOLANICH commented 6 years ago

It may be a bug in XGBoost sklearn wrapper (they have remade parameter from string parsing using functions C++ stdlib, and it seems it doesn't like that float is without point and in fact is int). My code was not affected since in my code I don't use sklearn wrappers. They really need type annotations, mypy checking and maybe more test coverage. I gonna verify that tomorrow.

titu1994 commented 6 years ago

Oh that is interesting. If possible, could you update your findings here as well?

KOLANICH commented 6 years ago

I have done some experiments.

  1. xgb_params["gamma"]=float(xgb_params["gamma"]) doesn't help
  2. using , as for decimal separator also doesn't help
  3. it seems that something is broken in their new string to float parsing logic. I don't know why my code was not affected since that.
KOLANICH commented 6 years ago

Fixed by https://github.com/dmlc/dmlc-core/pull/481

titu1994 commented 6 years ago

Oh I'm glad the bug is fixed. Thanks for the update !