h2oai / h2o-3

H2O is an Open Source, Distributed, Fast & Scalable Machine Learning Platform: Deep Learning, Gradient Boosting (GBM) & XGBoost, Random Forest, Generalized Linear Modeling (GLM with Elastic Net), K-Means, PCA, Generalized Additive Models (GAM), RuleFit, Support Vector Machine (SVM), Stacked Ensembles, Automatic Machine Learning (AutoML), etc.
http://h2o.ai
Apache License 2.0
6.78k stars 1.99k forks source link

.Accuracy() method not implemented for multinomial #10122

Open exalate-issue-sync[bot] opened 1 year ago

exalate-issue-sync[bot] commented 1 year ago

need to add .accuracy() for multinomials.

The following snippets don't work: {code} import h2o from h2o.estimators.glm import H2OGeneralizedLinearEstimator h2o.init() iris_df = h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/iris/iris.csv") predictors = iris_df.columns[0:4] response_col = "C5" train,valid,test = iris_df.split_frame([.7,.15], seed =1234) glm_model = H2OGeneralizedLinearEstimator(family="multinomial") glm_model.train(predictors, response_col, training_frame = train, validation_frame = valid) glm_model.accuracy()

another example with random forest

from h2o.estimators.random_forest import H2ORandomForestEstimator cars = "https://s3.amazonaws.com/h2o-public-test-data/smalldata/junit/cars_20mpg.csv" cars = h2o.import_file(cars) predictors = ["displacement","power","weight","acceleration","year"] response_col = "cylinders" cars[response_col] = cars[response_col].asfactor() train,valid,test = cars.split_frame([.7,.15],seed=1234) rf_model = H2ORandomForestEstimator() rf_model.train(x=predictors, y=response_col, training_frame=train, validation_frame=valid) rf_model.accuracy() {code}

exalate-issue-sync[bot] commented 1 year ago

Navdeep commented: I think thats fine. Multi-class error is what you're after for a multinomial problem.

exalate-issue-sync[bot] commented 1 year ago

Erin LeDell commented: The R API has an accuracy method, so for parity, we should add one to Python.

exalate-issue-sync[bot] commented 1 year ago

Erin LeDell commented: The way that h2o.accuracy works in R is that it returns a bunch of accuracies, corresponding to various thresholds:

{code}

h2o.accuracy(perf) threshold accuracy 1 0.9824807 0.6000000 2 0.9676546 0.6026316 3 0.9671792 0.6052632 4 0.9649031 0.6078947 5 0.9621187 0.6105263


  threshold  accuracy

375 0.015166897 0.4157895 376 0.014322204 0.4131579 377 0.013740748 0.4105263 378 0.013256314 0.4078947 379 0.012183072 0.4052632 380 0.008968162 0.4026316 {code}

Since we find a default threshold (that maximizes F1) for each model, I would expect that you could just extract the accuracy for that using h2o.accuracy

In R, some of our metrics functions like h2o.auc will work on either a model object or a performance object, so maybe we should follow that standard for h2o.accuracy as well… where the h2o.accuracy(model) returns the accuracy of the model using the stored threshold.

Current functionality: {code}

h2o.auc(model) [1] 0.9801618 h2o.auc(perf) [1] 0.9801618 h2o.accuracy(model) Error in h2o.metric(object, thresholds, "accuracy") : No accuracy for H2OBinomialModel {code}

Therefore, I propose that we do the following:

  1. In R, if h2o.accuracy is applied to a binomial or multinomial model object, we should return the accuracy for the stored threshold.
  2. We should replicate the functionality in Python, adding an "accuracy" method to both the model performance object and the model object.
h2o-ops commented 1 year ago

JIRA Issue Migration Info

Jira Issue: PUBDEV-3202 Assignee: New H2O Bugs Reporter: Lauren DiPerna State: Open Fix Version: N/A Attachments: N/A Development PRs: N/A