Open vnijs opened 4 years ago
Hi @vnijs thank you for posting your issue with a reproducible example! I actually don't see any evidence of a practically meaningful interaction effect (at least as far as the fitted neural network is concerned); for example, take a look at the PDPs below. While you are explicitly fitting an interaction effect between ad
and gender
in your fitted GzLM, it is likely significant due to the large sample size (N = 10,000 with only 4 features). So it makes sense to me that the output from vip::vint()
gives similar values to all three potential interactions. Again, this is only based on the quality of the fitted NN and does not imply no practically meaningful interaction effect exists for these data. BTW, running vint()
on your fitted GzLM does in fact identify the specified interaction effects!
library(vip)
library(iml)
library(pdp)
library(ggplot2)
facebook <- readr::read_rds("Downloads/facebook.rds")
mod <- glm(click ~ age + ad + gender + ad:gender, data = facebook, family=binomial)
summary(mod)
mod_nn <- nnet::nnet(click ~ age + ad + gender, size = 2, decay = 0.3, data = facebook)
#
# Partial dependence plots (PDPs)
#
# PDPs for main effects
pd1 <- partial(mod_nn, pred.var = "ad", prob = TRUE)
pd2 <- partial(mod_nn, pred.var = "age", prob = TRUE)
pd3 <- partial(mod_nn, pred.var = "gender", prob = TRUE)
# PDPs for two-way interaction effects
pd4 <- partial(mod_nn, pred.var = c("ad", "gender"), prob = TRUE)
pd5 <- partial(mod_nn, pred.var = c("ad", "age"), prob = TRUE)
pd6 <- partial(mod_nn, pred.var = c("age", "gender"), prob = TRUE)
# Diplay plots in a grid
grid.arrange(
autoplot(pd1) + ylim(0.85, 1),
autoplot(pd2) + ylim(0.85, 1),
autoplot(pd3) + ylim(0.85, 1),
autoplot(pd4) + ylim(0.85, 1),
autoplot(pd5) + ylim(0.85, 1),
autoplot(pd6) + ylim(0.85, 1),
nrow = 2
)
Also, I'm not sure it is wise to compare the SD from different types of models. However, it may be reasonable within the same class of models (though I haven't given this much thought). For example, fitting a GBM (gradient boosted decision trees) with a depth of 1 (i.e., no interactions allowed) to that of a GBM with a depth of greater than 1!
Does this help with your problem?
vint()
to a mix of numeric and categorical features, but in practice it seems to work reasonably well. Thanks for the detailed reply @bgreenwell! nnet can be a bit tricky to tune at times. If you run the below you should see the same pattern in effects for gender:ad as in the logistic regression. These may seem like small effects to you but with the type of data I commonly have access to (highly unbalanced) this is considered a pretty strong effect :)
Differences in the variable importance in the Garson plot (see below) often seem a pretty good indicator that a neural net is picking up something that you wouldn't see in a model without interactions. I'm going to play around with ratio-of-pdp-sd for some different datasets to see if this makes (some) sense.
library(nnet)
set.seed(1234)
mod_nn_no <- nnet(click == "yes" ~ age + ad + gender, size = 1, decay = 2, rang = .1, maxit = 1000, entropy = TRUE, data = facebook)
vip(mod_nn_no, method = "model", type = "garson")
set.seed(1234)
mod_nn <- nnet(click == "yes" ~ age + ad + gender, size = 3, decay = 0.3, rang = .1, maxit = 1000, entropy = TRUE, data = facebook)
autoplot(partial(mod_nn, pred.var = c("ad", "gender"), prob = TRUE))
vint(object = mod_nn, feature_names = c("age", "gender", "ad"))
vip(mod_nn, method = "model", type = "garson")
Interaction plot:
I have really enjoyed using your
pdp
andvip
packages and I read the documentation at https://koalaverse.github.io/vip/articles/vip-interaction.html with great interest. I appliedvint
to an example I use for my class with a (strong) interaction between two categorical variables. However the function seemed unable to identify the interaction. Theiml::Interaction
function, although much slower, was able to uncover the effect. Perhaps I'm missing something aboutvint
? See first example below.Also, I'm wondering if it would make sense to compare the
sd
for output frompdp::partial
for a model that can capture interactions (e.g., nnet) vs a logistics regression without interaction terms. See 2nd example + results belowIf you would like access to the data in the examples, please see the dropbox link below.
Compare sd vs a base model without interactions:
https://www.dropbox.com/s/9p6iirrps3ex57g/facebook.rds?dl=0