[New Functionalitiy]: Add explainability/interpretability functions from h2o. #31

Closed coforfe closed 2 years ago

coforfe commented 2 years ago


Thanks for bringing h2o capabilities to tidymodels!.

h2o already includes various functions to help in model's interpretation/explainability for binary classification and regression models:

These functions can also be applied to an h2o.automl() object.

All the available h2o functionality is documented here

Thanks! Carlos.

qiushiyan commented 2 years ago

Thank you for pointing to these functions @coforfe! We are waiting for the next cran release of h2o to launch a more stable version, after which I will consider if we should include wrappers of these functions. In the meantime, you can use them by extracting the underlying h2o fit object with extract_fit_engine(). Additionally, due to a limitation in how agua fits models, you have to rename the target column in the test set to ".outcome". See examples below

#> Loading required package: parsnip
#> Registered S3 method overwritten by 'agua':
#>   method        from     
#>   tidy.workflow workflows

multinom_mod <- multinom_reg() %>% 
  set_engine("h2o") %>% 
  fit(Species ~ ., data = iris)
# rename target column to ".outcome"
iris_wf <- h2o::as.h2o(iris %>% dplyr::rename(.outcome = Species))
#>   |                                                                              |                                                                      |   0%  |                                                                              |======================================================================| 100%
multinom_mod %>% 
  extract_fit_engine() %>% 
#> Confusion Matrix
#> ================
#> > Confusion matrix shows a predicted class vs an actual class.
#> GLM_model_R_1661994490294_99
#> ----------------------------
#> |  | setosa | versicolor | virginica | Error | Rate
#> |:---:|:---:|:---:|:---:|:---:|:---:|
#> | **setosa** |50 | 0 | 0 | 0 | 0 / 50 | 
#> | **versicolor** |0 | 48 | 2 | 0.04 | 2 / 50 | 
#> | **virginica** |0 | 1 | 49 | 0.02 | 1 / 50 | 
#> | **Totals** |50 | 49 | 51 | 0.02 | 3 / 150 | 
#> Variable Importance
#> ===================
#> > The variable importance plot shows the relative importance of the most important variables in the model.

#> Partial Dependence Plots
#> ========================
#> > Partial dependence plot (PDP) gives a graphical depiction of the marginal effect of a variable on the response. The effect of a variable is measured in change in the mean response. PDP assumes independence between the feature for which is the PDP computed and the rest.

# automl example 
auto_mod <- auto_ml() %>% 
  set_engine("h2o", max_runtime_secs = 20) %>% 
  set_mode("regression") %>% 
  fit(mpg ~ ., data = mtcars)

mtcars_wf <- h2o::as.h2o(mtcars %>% dplyr::rename(.outcome = mpg))
#>   |                                                                              |                                                                      |   0%  |                                                                              |======================================================================| 100%

auto_mod %>% 
  extract_fit_engine() %>% 

Created on 2022-08-31 with reprex v2.0.2

coforfe commented 2 years ago

Thanks a lot Qiushi Yan!. I will use the hints you provide.

It is curious because CRAN is two releases behind what is already available in H2O ( vs Well, those new ones, you see that are minor releases for H2O.

Thanks again! Carlos.

