h2oai / h2o-3

H2O is an Open Source, Distributed, Fast & Scalable Machine Learning Platform: Deep Learning, Gradient Boosting (GBM) & XGBoost, Random Forest, Generalized Linear Modeling (GLM with Elastic Net), K-Means, PCA, Generalized Additive Models (GAM), RuleFit, Support Vector Machine (SVM), Stacked Ensembles, Automatic Machine Learning (AutoML), etc.
http://h2o.ai
Apache License 2.0
6.93k stars 2k forks source link

Calibration does not work as expected when using mojo. #7705

Open exalate-issue-sync[bot] opened 1 year ago

exalate-issue-sync[bot] commented 1 year ago

The example described at [https://docs.h2o.ai/h2o/latest-stable/h2o-docs/data-science/algo-params/calibration_frame.html|https://docs.h2o.ai/h2o/latest-stable/h2o-docs/data-science/algo-params/calibration_frame.html] works as expected. However, if you save the model as a mojo and the prediction is made, the calibrated probabilities disappear from the output.

{code:python}import h2o from h2o.estimators.gbm import H2OGradientBoostingEstimator h2o.init()

Import the ecology dataset

ecology = h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/gbm_test/ecology_model.csv")

Convert response column to a factor

ecology['Angaus'] = ecology['Angaus'].asfactor()

Set the predictors and the response column name

response = 'Angaus' predictors = ecology.columns[3:13]

Split into train and calibration sets

train, calib = ecology.split_frame(seed = 12354)

Introduce a weight column (artificial non-constant) ONLY to the train set (NOT the calibration one)

w = h2o.create_frame(binary_fraction=1, binary_ones_fraction=0.5, missing_fraction=0, rows=744, cols=1) w.set_names(["weight"]) train = train.cbind(w)

Train an H2O GBM Model with Calibration

ecology_gbm = H2OGradientBoostingEstimator(ntrees = 10, max_depth = 5, min_rows = 10, learn_rate = 0.1, distribution = "multinomial", calibrate_model = True, calibration_frame = calib) ecology_gbm.train(x = predictors, y = "Angaus", training_frame = train, weights_column = "weight")

predicted = ecology_gbm.predict(train)

View the calibrated predictions appended to the original predictions

predicted predict p0 p1 cal_p0 cal_p1


    1  0.319428  0.680572   0.185613   0.814387
    0  0         0          0.0274573  0.972543
    0  0.90577   0.0942296  0.913323   0.0866773
    0  0.783394  0.216606   0.825601   0.174399
    0  0.899183  0.100817   0.909852   0.0901482
    0  0         0          0.0274573  0.972543
    0  0.909846  0.090154   0.915409   0.0845909
    1  0.456384  0.543616   0.358169   0.641831
    0  0         0          0.0274573  0.972543
    0  0.918923  0.0810765  0.919893   0.0801069

[744 rows x 5 columns]

If we now save the model as mojo and repeat the same operation:

my_save_mojo = ecology_gbm.save_mojo("", force=True) mojo_model = h2o.import_mojo(my_save_mojo) mojo_predicted = mojo_model .predict(train)

The calibrated predictions are not appended to the original predictions

predicted

predict p0 p1


    1  0.319428  0.680572  
    0  0         0          
    0  0.90577   0.0942296  
    0  0.783394  0.216606 
    0  0.899183  0.100817   
    0  0         0          
    0  0.909846  0.090154   
    1  0.456384  0.543616   
    0  0         0          
    0  0.918923  0.0810765  

[744 rows x 3 columns]{code}

h2o-ops commented 1 year ago

JIRA Issue Migration Info

Jira Issue: PUBDEV-7940 Assignee: New H2O Bugs Reporter: Carlos Munoz State: Open Fix Version: N/A Attachments: N/A Development PRs: N/A

wangjy38 commented 1 year ago

Met the same issue, any updates?