H2O is an Open Source, Distributed, Fast & Scalable Machine Learning Platform: Deep Learning, Gradient Boosting (GBM) & XGBoost, Random Forest, Generalized Linear Modeling (GLM with Elastic Net), K-Means, PCA, Generalized Additive Models (GAM), RuleFit, Support Vector Machine (SVM), Stacked Ensembles, Automatic Machine Learning (AutoML), etc.
It seems like max_hit_ratio_k doesn't actually get used in the backend. We should fix this, so that if a user sets this argument it changes what is shown by the hit_ratio_table() method (py/r) and the K-Top Hit Ratio table that gets displayed by default when you do model.show().
It seems like
max_hit_ratio_k
doesn't actually get used in the backend. We should fix this, so that if a user sets this argument it changes what is shown by the hit_ratio_table() method (py/r) and the K-Top Hit Ratio table that gets displayed by default when you do model.show().more details on how this parameter should work can be found here: http://docs.h2o.ai/h2o/latest-stable/h2o-docs/data-science/algo-params/max_hit_ratio_k.html
code the test the issue out below, note that setting max_hit_ratio_k = 3 or not specifying the max_hit_ratio_k parameter doesn't change the output.
{code} library(h2o) h2o.init()
import the covtype dataset:
this dataset is used to classify the correct forest cover type
original dataset can be found at https://archive.ics.uci.edu/ml/datasets/Covertype
covtype <- h2o.importFile("https://s3.amazonaws.com/h2o-public-test-data/smalldata/covtype/covtype.20k.data")
convert response column to a factor
covtype[,55] <- as.factor(covtype[,55])
set the predictor names and the response column name
predictors <- colnames(covtype[1:54]) response <- 'C55'
split into train and validation sets
covtype.splits <- h2o.splitFrame(data = covtype, ratios = .8, seed = 1234) train <- covtype.splits[[1]] valid <- covtype.splits[[2]]
try using the max_hit_ratio_k parameter:
max_hit_ratio_k does not affect the actual model fit, and is for information
and inner-H2O calculations
cov_gbm <- h2o.gbm(x = predictors, y = response, training_frame = train, validation_frame = valid, max_hit_ratio_k = 3, seed = 1234)
print out model results to see the max_hite_ratio_k table
cov_gbm
note that table display wont change when you set max_hit_ratio_k less than 7
h2o.hit_ratio_table(cov_gbm, train = T) {code}