Open exalate-issue-sync[bot] opened 1 year ago
Navdeep commented: I think thats fine. Multi-class error is what you're after for a multinomial problem.
Erin LeDell commented: The R API has an accuracy method, so for parity, we should add one to Python.
Erin LeDell commented: The way that h2o.accuracy
works in R is that it returns a bunch of accuracies, corresponding to various thresholds:
{code}
h2o.accuracy(perf) threshold accuracy 1 0.9824807 0.6000000 2 0.9676546 0.6026316 3 0.9671792 0.6052632 4 0.9649031 0.6078947 5 0.9621187 0.6105263
threshold accuracy
375 0.015166897 0.4157895 376 0.014322204 0.4131579 377 0.013740748 0.4105263 378 0.013256314 0.4078947 379 0.012183072 0.4052632 380 0.008968162 0.4026316 {code}
Since we find a default threshold (that maximizes F1) for each model, I would expect that you could just extract the accuracy for that using h2o.accuracy
In R, some of our metrics functions like h2o.auc
will work on either a model object or a performance object, so maybe we should follow that standard for h2o.accuracy
as well… where the h2o.accuracy(model)
returns the accuracy of the model using the stored threshold.
Current functionality: {code}
h2o.auc(model) [1] 0.9801618 h2o.auc(perf) [1] 0.9801618 h2o.accuracy(model) Error in h2o.metric(object, thresholds, "accuracy") : No accuracy for H2OBinomialModel {code}
Therefore, I propose that we do the following:
h2o.accuracy
is applied to a binomial or multinomial model object, we should return the accuracy for the stored threshold.JIRA Issue Migration Info
Jira Issue: PUBDEV-3202 Assignee: New H2O Bugs Reporter: Lauren DiPerna State: Open Fix Version: N/A Attachments: N/A Development PRs: N/A
need to add .accuracy() for multinomials.
The following snippets don't work: {code} import h2o from h2o.estimators.glm import H2OGeneralizedLinearEstimator h2o.init() iris_df = h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/iris/iris.csv") predictors = iris_df.columns[0:4] response_col = "C5" train,valid,test = iris_df.split_frame([.7,.15], seed =1234) glm_model = H2OGeneralizedLinearEstimator(family="multinomial") glm_model.train(predictors, response_col, training_frame = train, validation_frame = valid) glm_model.accuracy()
another example with random forest
from h2o.estimators.random_forest import H2ORandomForestEstimator cars = "https://s3.amazonaws.com/h2o-public-test-data/smalldata/junit/cars_20mpg.csv" cars = h2o.import_file(cars) predictors = ["displacement","power","weight","acceleration","year"] response_col = "cylinders" cars[response_col] = cars[response_col].asfactor() train,valid,test = cars.split_frame([.7,.15],seed=1234) rf_model = H2ORandomForestEstimator() rf_model.train(x=predictors, y=response_col, training_frame=train, validation_frame=valid) rf_model.accuracy() {code}