runehaubo / lmerTestR

Repository for the R-package lmerTest
48 stars 9 forks source link

Satterthwaite degrees of freedom change drastically depending on the DV #8

Closed sho-87 closed 6 years ago

sho-87 commented 6 years ago

I posted the following on CrossValidated but it wasn't getting much traction so I figured I'd post here as well. Not sure if theres a simple explanation for this...

I have a couple of MLM models created using lme4:

y1 ~ x1 + x2 + x3 + x4 + (1+x4|id)
y2 ~ x1 + x2 + x3 + x4 + (1+x4|id)

notice that the only difference between them is the DV.

When I use lmerTest to get p-values I notice that my degrees of freedom for some of the predictors changes quite drastically between the 2 models. For example, in model 1 x4 df might be 38.50, while in model 2 df for the same predictor might be 260.50

Is that expected behavior?

Given that my predictor variables are identical in both cases (i.e. this can't be a case of one model having more missing data than the other), why is there such a difference in the degrees of freedom when only the DV is changed?

Is there something about the Satterthwaite approximation that takes into account the DV, and hence degrees of freedom are expected to be so different?

Comparing Satterthwaite and Kenward-Roger (just for demonstration purposes - Id prefer to use regular summary(model) as it gives me more information, like random effects estimates and beta estimates)

Im not sure why there are minor fluctuations in df across the board between the 2 models (for both Satterthwaite and Kenward-Roger), but more importantly, notice how the x4 df is 10x larger in model 2 when using Satterthwaite

model 1:

model <- lmer(step_mean ~ x1 + x2 + x3 + x4 + (1+x4|id), data=df, REML=T)

summary(model)

Fixed effects:
                     Estimate Std. Error        df t value Pr(>|t|)    
(Intercept)          -0.06003    0.12845  35.07528  -0.467 0.643161    
x1                    0.49117    0.12548  35.12842   3.914 0.000398 ***
x2                   -0.01394    0.01225 259.84143  -1.138 0.256368    
x3                    0.01414    0.28512  34.47940   0.050 0.960745    
x4                   -0.04091    0.01086  25.53492  -3.767 0.000874 ***

anova(model, ddf='Kenward-Roger')

Analysis of Variance Table of type III  with  Kenward-Roger 
approximation for degrees of freedom
                     Sum Sq Mean Sq NumDF   DenDF F.value    Pr(>F)    
x1                  0.47355 0.47355     1  35.119 14.5328 0.0005336 ***
x2                  0.04145 0.04145     1 259.955  1.2721 0.2604083    
x3                  0.00007 0.00007     1  34.435  0.0023 0.9621706    
x4                  0.43630 0.43630     1  27.832 13.3899 0.0010463 ** 

model 2:

model <- lmer(stride ~ x1 + x2 + x3 + x4 + (1+x4|id), data=df, REML=T)

summary(model)

Fixed effects:
                     Estimate Std. Error        df t value Pr(>|t|)
(Intercept)          -0.05924    0.09010  35.35792  -0.657    0.515
x1                    0.08257    0.08865  34.98204   0.931    0.358
x2                   -0.03555    0.05087 295.62573  -0.699    0.485
x3                    0.08774    0.20271  35.43835   0.433    0.668
x4                    0.02290    0.04407 260.86367   0.520    0.604

anova(model, ddf='Kenward-Roger')

Analysis of Variance Table of type III  with  Kenward-Roger 
approximation for degrees of freedom
                     Sum Sq Mean Sq NumDF   DenDF F.value Pr(>F)
x1                  0.52223 0.52223     1  34.736 0.82341 0.3704
x2                  0.29974 0.29974     1 294.418 0.47260 0.4923
x3                  0.11041 0.11041     1  35.005 0.17409 0.6791
x4                  0.15516 0.15516     1  24.510 0.24464 0.6253
runehaubo commented 6 years ago

Please see https://bit.ly/2I0GvFC and continue follow-up there. I'm closing the issue here.

Rune