Scalable, Portable and Distributed Gradient Boosting (GBDT, GBRT or GBM) Library, for Python, R, Java, Scala, C++ and more. Runs on single machine, Hadoop, Spark, Flink and DataFlow
Other
44
stars
26
forks
source link
H2OXGBoostEstimator with cv cannot report validation score correctly #38
Since I upgraded my h2o python package to 3.18.0.8, the H2OXGBoostEstimator could not give me the score on validation set correctly. The details are following:
Steps to reproduce
# python version: 3.6.4 |Anaconda custom (x86_64)| (default, Dec 21 2017, 15:39:08)
# [GCC 4.2.1 Compatible Clang 4.0.1 (tags/RELEASE_401/final)]
import h2o
h2o.init()
# load iris data
data = h2o.load_dataset('iris')
# train the xgboost model with 10-fold cross-validation
from h2o.estimators import H2OXGBoostEstimator
model = H2OXGBoostEstimator(nfolds=10, keep_cross_validation_predictions=True)
model.train(y = 'Species', training_frame = data)
# get model for each fold, and print the score
cv_models = model.cross_validation_models()
for i in range(10):
model_cv = cv_models[i]
print (model_cv.r2(train=True, valid=True, xval=True))
As you can see, the score on train set and validation set are exactly the same. This does not happen for other estimators. If I am doing this wrong, please correct me. Thank you!
Since I upgraded my h2o python package to 3.18.0.8, the H2OXGBoostEstimator could not give me the score on validation set correctly. The details are following:
Steps to reproduce
The results of code above will be:
As you can see, the score on train set and validation set are exactly the same. This does not happen for other estimators. If I am doing this wrong, please correct me. Thank you!