Closed exalate-issue-sync[bot] closed 1 year ago
Zuzana Olajcová commented: Hi [~accountid:5d1185d4f46aa30c271c7cc6] , I’ve prepared the content to update docs. Can you please review it from your POV and add it to the User Guide? Thanks!
JIRA Issue Migration Info
Jira Issue: PUBDEV-7763 Assignee: hannah.tillman Reporter: Erin LeDell State: Resolved Fix Version: 3.32.0.1 Attachments: N/A Development PRs: Available
Linked PRs from JIRA
https://github.com/h2oai/h2o-3/pull/4928 https://github.com/h2oai/h2o-3/pull/4943
We should add a page on RuleFit to the Supervised algorithms section in the User Guide: [http://docs.h2o.ai/h2o/latest-stable/h2o-docs/data-science.html#supervised|http://docs.h2o.ai/h2o/latest-stable/h2o-docs/data-science.html#supervised]
The new content:
h2. Introduction
Rulefit algorithm combines tree ensembles and linear models to take advantage of both methods: a tree ensemble accuracy and a linear model interpretability.
The general algorithm fits a tree ensebmle to the data, builds a rule ensemble by traversing each tree, evaluates the rules on the data to build a rule feature set and fits a sparse linear model (LASSO) to the rule feature set joined with the original feature set.
h2. Defining a RuleFit Model (beta API)
h2. Interpreting a RuleFit Model
The output for the RuleFit model includes:
h2. Examples
in R:
{noformat}library(h2o) h2o.init()
f <- "https://s3.amazonaws.com/h2o-public-test-data/smalldata/gbm_test/titanic.csv" titanic <- h2o.importFile(f)
response = "survived" predictors <- c("age", "sibsp", "parch", "fare", "sex", "pclass")
titanic[,response] <- as.factor(titanic[,response]) titanic[,"pclass"] <- as.factor(titanic[,"pclass"])
rf_h2o = h2o.rulefit(y=response, x=predictors, training_frame = titanic, max_rule_length=10, max_num_rules=100, seed=1234)
print(rf_h2o@model$rule_importance){noformat}
in Py:
{noformat}import h2o h2o.init() from h2o.estimators.rulefit import H2ORuleFitEstimator
df = h2o.import_file("https://s3.amazonaws.com/h2o-public-test-data/smalldata/gbm_test/titanic.csv", col_types={'pclass': "enum", 'survived': "enum"})
x = ["age", "sibsp", "parch", "fare", "sex", "pclass"]
rf_h2o = H2ORuleFitEstimator(max_rule_length=10, max_num_rules=100, seed=1234, model_type="rules_and_linear") rf_h2o.train(training_frame=df, x=x, y="survived")
print(rf_h2o._model_json['output']['rule_importance']){noformat}
h2. References
FRIEDMAN, J. H., & POPESCU, B. E. (2008). Predictive learning via rule ensembles. The Annals of Applied Statistics, 2(3), 916-954.