tidymodels / agua

Create and evaluate models using 'tidymodels' and 'h2o'
https://agua.tidymodels.org
Other
21 stars 2 forks source link

[New Functionalitiy]: Add explainability/interpretability functions from h2o. #31

Closed coforfe closed 2 years ago

coforfe commented 2 years ago

Hi,

Thanks for bringing h2o capabilities to tidymodels!.

h2o already includes various functions to help in model's interpretation/explainability for binary classification and regression models:

These functions can also be applied to an h2o.automl() object.

All the available h2o functionality is documented here

Thanks! Carlos.

qiushiyan commented 2 years ago

Thank you for pointing to these functions @coforfe! We are waiting for the next cran release of h2o to launch a more stable version, after which I will consider if we should include wrappers of these functions. In the meantime, you can use them by extracting the underlying h2o fit object with extract_fit_engine(). Additionally, due to a limitation in how agua fits models, you have to rename the target column in the test set to ".outcome". See examples below

library(agua)
#> Loading required package: parsnip
#> Registered S3 method overwritten by 'agua':
#>   method        from     
#>   tidy.workflow workflows
h2o_start()

multinom_mod <- multinom_reg() %>% 
  set_engine("h2o") %>% 
  fit(Species ~ ., data = iris)
# rename target column to ".outcome"
iris_wf <- h2o::as.h2o(iris %>% dplyr::rename(.outcome = Species))
#>   |                                                                              |                                                                      |   0%  |                                                                              |======================================================================| 100%
multinom_mod %>% 
  extract_fit_engine() %>% 
  h2o::h2o.explain(iris_wf)
#> 
#> 
#> Confusion Matrix
#> ================
#> 
#> > Confusion matrix shows a predicted class vs an actual class.
#> 
#> 
#> 
#> GLM_model_R_1661994490294_99
#> ----------------------------
#> 
#> |  | setosa | versicolor | virginica | Error | Rate
#> |:---:|:---:|:---:|:---:|:---:|:---:|
#> | **setosa** |50 | 0 | 0 | 0 | 0 / 50 | 
#> | **versicolor** |0 | 48 | 2 | 0.04 | 2 / 50 | 
#> | **virginica** |0 | 1 | 49 | 0.02 | 1 / 50 | 
#> | **Totals** |50 | 49 | 51 | 0.02 | 3 / 150 | 
#> 
#> 
#> Variable Importance
#> ===================
#> 
#> > The variable importance plot shows the relative importance of the most important variables in the model.

#> 
#> 
#> Partial Dependence Plots
#> ========================
#> 
#> > Partial dependence plot (PDP) gives a graphical depiction of the marginal effect of a variable on the response. The effect of a variable is measured in change in the mean response. PDP assumes independence between the feature for which is the PDP computed and the rest.


# automl example 
auto_mod <- auto_ml() %>% 
  set_engine("h2o", max_runtime_secs = 20) %>% 
  set_mode("regression") %>% 
  fit(mpg ~ ., data = mtcars)

mtcars_wf <- h2o::as.h2o(mtcars %>% dplyr::rename(.outcome = mpg))
#>   |                                                                              |                                                                      |   0%  |                                                                              |======================================================================| 100%

auto_mod %>% 
  extract_fit_engine() %>% 
  h2o::h2o.shap_summary_plot(mtcars_wf)

Created on 2022-08-31 with reprex v2.0.2

coforfe commented 2 years ago

Thanks a lot Qiushi Yan!. I will use the hints you provide.

It is curious because CRAN is two releases behind what is already available in H2O (3.36.1.2 vs 3.36.1.4). Well, those new ones, you see that are minor releases for H2O.

Thanks again! Carlos.

github-actions[bot] commented 1 year ago

This issue has been automatically locked. If you believe you have found a related problem, please file a new issue (with a reprex: https://reprex.tidyverse.org) and link to this issue.