koalaverse / vip

Variable Importance Plots (VIPs)
https://koalaverse.github.io/vip/
186 stars 24 forks source link

vint with categorical variables #83

Open vnijs opened 4 years ago

vnijs commented 4 years ago

I have really enjoyed using your pdp and vip packages and I read the documentation at https://koalaverse.github.io/vip/articles/vip-interaction.html with great interest. I applied vint to an example I use for my class with a (strong) interaction between two categorical variables. However the function seemed unable to identify the interaction. The iml::Interaction function, although much slower, was able to uncover the effect. Perhaps I'm missing something about vint? See first example below.

Also, I'm wondering if it would make sense to compare the sd for output from pdp::partial for a model that can capture interactions (e.g., nnet) vs a logistics regression without interaction terms. See 2nd example + results below

If you would like access to the data in the examples, please see the dropbox link below.

> library(vip)
> library(iml)
> 
> facebook <- readr::read_rds("data/facebook.rds")
> mod <- glm(click ~ age + ad + gender + ad:gender, data = facebook, family=binomial)
> summary(mod)

Call:
glm(formula = click ~ age + ad + gender + ad:gender, family = binomial, 
    data = facebook)

Deviance Residuals: 
    Min       1Q   Median       3Q      Max  
-2.9315   0.2115   0.2888   0.3662   0.7883  

Coefficients:
                  Estimate Std. Error z value Pr(>|z|)    
(Intercept)       5.399131   0.251074  21.504  < 2e-16 ***
age              -0.048799   0.005368  -9.091  < 2e-16 ***
adB              -0.879553   0.145786  -6.033 1.61e-09 ***
genderfemale     -1.022528   0.146756  -6.968 3.23e-12 ***
adB:genderfemale  2.054939   0.205099  10.019  < 2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 4005.6  on 9999  degrees of freedom
Residual deviance: 3811.1  on 9995  degrees of freedom
AIC: 3821.1

Number of Fisher Scoring iterations: 6

> 
> mod_nn <- nnet::nnet(click ~ age + ad + gender, size = 2, decay = 0.3, data = facebook)
# weights:  11
...
converged
> vint(object = mod_nn, feature_names = c("age", "gender", "ad"))
# A tibble: 3 x 2
  Variables  Interaction
  <fct>            <dbl>
1 age*ad          0.0472
2 age*gender      0.0366
3 gender*ad       0.0337
> 
> predictor = iml::Predictor$new(mod_nn, data = facebook[,-1], y = facebook$click)
> iml::Interaction$new(predictor)

  .feature .interaction
1      age    0.2045499
2   gender    0.7911664
3       ad    1.0697385

Compare sd vs a base model without interactions:

> mod_no <- glm(click ~ age + ad + gender, data = facebook, family=binomial)
> p1 <- pdp::partial(mod_no, train = facebook, pred.var = c("gender", "ad"), prob = TRUE)
> p2 <- pdp::partial(mod, train = facebook, pred.var = c("gender", "ad"), prob = TRUE)
> sd(p2$yhat)/sd(p1$yhat) ## strong increase for interaction effect
[1] 6.3486
> 
> p1 <- pdp::partial(mod_no, train = facebook, pred.var = c("gender", "age"), prob = TRUE)
> p2 <- pdp::partial(mod, train = facebook, pred.var = c("gender", "age"), prob = TRUE)
> sd(p2$yhat)/sd(p1$yhat) ## minimal change where no interaction exists
[1] 0.9740408
> 
> p1 <- pdp::partial(mod_no, train = facebook, pred.var = c("gender", "ad"), prob = TRUE)
> p2 <- pdp::partial(mod_nn, train = facebook, pred.var = c("gender", "ad"), prob = TRUE)
> sd(p2$yhat)/sd(p1$yhat) ## strong increase for interaction effect captured by nnet model
[1] 5.640238

https://www.dropbox.com/s/9p6iirrps3ex57g/facebook.rds?dl=0

bgreenwell commented 4 years ago

Hi @vnijs thank you for posting your issue with a reproducible example! I actually don't see any evidence of a practically meaningful interaction effect (at least as far as the fitted neural network is concerned); for example, take a look at the PDPs below. While you are explicitly fitting an interaction effect between ad and gender in your fitted GzLM, it is likely significant due to the large sample size (N = 10,000 with only 4 features). So it makes sense to me that the output from vip::vint() gives similar values to all three potential interactions. Again, this is only based on the quality of the fitted NN and does not imply no practically meaningful interaction effect exists for these data. BTW, running vint() on your fitted GzLM does in fact identify the specified interaction effects!

library(vip)
library(iml)
library(pdp)
library(ggplot2)

facebook <- readr::read_rds("Downloads/facebook.rds")
mod <- glm(click ~ age + ad + gender + ad:gender, data = facebook, family=binomial)
summary(mod)

mod_nn <- nnet::nnet(click ~ age + ad + gender, size = 2, decay = 0.3, data = facebook)

#
# Partial dependence plots (PDPs)
#

# PDPs for main effects
pd1 <- partial(mod_nn, pred.var = "ad", prob = TRUE)
pd2 <- partial(mod_nn, pred.var = "age", prob = TRUE)
pd3 <- partial(mod_nn, pred.var = "gender", prob = TRUE)

# PDPs for two-way interaction effects
pd4 <- partial(mod_nn, pred.var = c("ad", "gender"), prob = TRUE)
pd5 <- partial(mod_nn, pred.var = c("ad", "age"), prob = TRUE)
pd6 <- partial(mod_nn, pred.var = c("age", "gender"), prob = TRUE)

# Diplay plots in a grid
grid.arrange(
  autoplot(pd1) + ylim(0.85, 1),
  autoplot(pd2) + ylim(0.85, 1),
  autoplot(pd3) + ylim(0.85, 1),
  autoplot(pd4) + ylim(0.85, 1),
  autoplot(pd5) + ylim(0.85, 1),
  autoplot(pd6) + ylim(0.85, 1),
  nrow = 2
)

image

Also, I'm not sure it is wise to compare the SD from different types of models. However, it may be reasonable within the same class of models (though I haven't given this much thought). For example, fitting a GBM (gradient boosted decision trees) with a depth of 1 (i.e., no interactions allowed) to that of a GBM with a depth of greater than 1!

Does this help with your problem?

vnijs commented 4 years ago

Thanks for the detailed reply @bgreenwell! nnet can be a bit tricky to tune at times. If you run the below you should see the same pattern in effects for gender:ad as in the logistic regression. These may seem like small effects to you but with the type of data I commonly have access to (highly unbalanced) this is considered a pretty strong effect :)

Differences in the variable importance in the Garson plot (see below) often seem a pretty good indicator that a neural net is picking up something that you wouldn't see in a model without interactions. I'm going to play around with ratio-of-pdp-sd for some different datasets to see if this makes (some) sense.

library(nnet)
set.seed(1234)
mod_nn_no <- nnet(click == "yes" ~ age + ad + gender, size = 1, decay = 2, rang = .1, maxit = 1000, entropy = TRUE, data = facebook)
vip(mod_nn_no, method = "model", type = "garson")

set.seed(1234)
mod_nn <- nnet(click == "yes" ~ age + ad + gender, size = 3, decay = 0.3, rang = .1, maxit = 1000, entropy = TRUE, data = facebook)
autoplot(partial(mod_nn, pred.var = c("ad", "gender"), prob = TRUE))
vint(object = mod_nn, feature_names = c("age", "gender", "ad"))
vip(mod_nn, method = "model", type = "garson")

image

image

Interaction plot:

image