lrberge / fixest

Fixed-effects estimations
https://lrberge.github.io/fixest/
377 stars 59 forks source link

Variance Inflation Factors #410

Open simonschoe opened 1 year ago

simonschoe commented 1 year ago

Is there a solution to compute variance inflation factors (VIFs) from fixest objects? To the best of my knowledge, car::vif() only accepts "base R" model objects, such as lm or glm.

kylebutts commented 1 year ago

@simonschoe, pretty easy to modify the code from https://github.com/cran/car/blob/d67a2f5ed4013c8766e8d3e827dcaf6aaaf7f0fa/R/vif.R

library(fixest)
library(car)
#> Loading required package: carData

vif_feols <- function(mod) {
  v <- vcov(mod)

  # Ad-hoc way of getting "assign" attr from feols
  vars = gsub("::(.*)$", "", colnames(model.matrix(mod)))
  assign <- as.numeric(factor(vars, levels = unique(vars))) - 1

  if (names(coefficients(mod)[1]) == "(Intercept)") {
    v <- v[-1, -1]
    assign <- assign[-1]
  } else { 
    warning("No intercept: vifs may not be sensible.")
  }
  terms <- labels(terms(mod))
  n.terms <- length(terms)
  if (n.terms < 2) stop("model contains fewer than 2 terms")
  R <- cov2cor(v)
  detR <- det(R)
  result <- matrix(0, n.terms, 3)
  rownames(result) <- terms
  colnames(result) <- c("GVIF", "Df", "GVIF^(1/(2*Df))")
  for (term in 1:n.terms) {
    subs <- which(assign == term)
    result[term, 1] <- det(as.matrix(R[subs, subs])) *
        det(as.matrix(R[-subs, -subs])) / detR
    result[term, 2] <- length(subs)
  }
  if (all(result[, 2] == 1)) {
    result <- result[, 1]
  } else {
    result[, 3] <- result[, 1]^(1/(2 * result[, 2]))
  }
  result
}

model <- lm(mpg ~ disp + hp + wt + factor(cyl), data = mtcars)
model_feols <- feols(mpg ~ disp + hp + wt + i(cyl), data = mtcars)

car::vif(model)
#>                  GVIF Df GVIF^(1/(2*Df))
#> disp        12.900887  1        3.591780
#> hp           3.531254  1        1.879163
#> wt           5.368271  1        2.316953
#> factor(cyl)  8.993015  2        1.731715
vif_feols(model_feols)
#>             GVIF Df GVIF^(1/(2*Df))
#> disp   12.900887  1        3.591780
#> hp      3.531254  1        1.879163
#> wt      5.368271  1        2.316953
#> i(cyl)  8.993015  2        1.731715

Created on 2023-05-05 with reprex v2.0.2

simonschoe commented 1 year ago

@kylebutts indeed thanks for the hint. However, as soon as I include fixed effects (which absorb the intercept), car::vif() returns the following warning and prevents a meaningful interpretation of the VIFs: Warning: No intercept: vifs may not be sensible.

This can be reproduced via:

feols(mpg ~ disp + hp + wt + i(cyl) | carb, data = mtcars) |> 
  vif_feols()

As a workaround, I've implemented a custom solution that first demeans all variables and computes VIFs from the demeaned data like so (where mod is a feols object):

get_max_vif <- \(mod) {
  # fit OLS on demeaned data
  ols <- lm(y ~ ., data = bind_cols(mod$X_demeaned, y = mod$y_demeaned))
  # calculate VIF scores
  car::vif(ols, type = "terms") |> max()
}

Is this a sensible approach?

kylebutts commented 1 year ago

I'm not sure! I haven't worked much with VIF, so I don't know how to think about it with demeaned data. Sorry I can't be of more help in that regard