rbchan / unmarked

R package for hierarchical models in ecological research
https://rbchan.github.io/unmarked/
37 stars 25 forks source link

Add function to calculate VIFs #141

Closed kenkellner closed 4 years ago

kenkellner commented 4 years ago

There's been a few questions about calculating variance inflation factors (VIFs) for unmarked models on the mailing list. It's fairly easy to do this manually but I figured we could do better. This adds a function to calculate variance inflation factors for parameters in one level of an unmarked model (e.g. 'det' or 'state'). I called the function vif() to emulate car::vif() which I think is the most widely used function for this in R, but I could see changing it to VIF().

The function subsets the var-covar matrix to a specific model level, converts it to a correlation matrix, and uses that to calculate VIFs the same way car::vif() does (see here). This results in slightly different estimates of VIFs from the alternative approach of fitting a linear model with one covariate as the response and the others as the predictors, e.g.:

library(unmarked)
set.seed(123)
data(frogs)
pferUMF <- unmarkedFrameOccu(pfer.bin)
siteCovs(pferUMF) <- data.frame(sitevar1 = rnorm(numSites(pferUMF)))

#Add some correlated covariates
obsvar2 = rnorm(numSites(pferUMF) * obsNum(pferUMF))
obsvar3 = rnorm(numSites(pferUMF) * obsNum(pferUMF),mean=obsvar2,sd=0.5)
obsCovs(pferUMF) <- data.frame(
                      obsvar1 = rnorm(numSites(pferUMF) * obsNum(pferUMF)),
                      obsvar2=obsvar2,obsvar3=obsvar3)

fm <- occu(~ obsvar1+obsvar2+obsvar3 ~ 1, pferUMF) 

#Get values for det model
vif(fm, type='det')

##  obsvar1  obsvar2  obsvar3 
## 1.002240 4.494552 4.490039

#Compare to alternative way of calculating
vt <- lm(obsvar2~obsvar1+obsvar3,data=obsCovs(pferUMF))
1/(1-summary(vt)$r.squared)

## [1] 4.444054

I assume this is just because of different optimization approaches. Regardless, getting a super exact VIF value doesn't seem crucial to me since it's a rough diagnostic. Calculating VIFs the second way (with lm) would be an order of magnitude more complicated and would probably require separate methods for each unmarked fitting function, so I didn't really want to go that route.