Closed hnguyen19 closed 10 months ago
The covariate is actually taken into account, but then the results are suppressed because those terms have zero d.f..:
> joint_tests(mod, show0df = TRUE)
model term df1 df2 F.ratio p.value note
site 5 373231 0.000 1.0000
C 1 373231 0.000 1.0000
R 2 373231 0.000 1.0000
covar 0 NA NA NA d e
C:R 2 373231 0.000 1.0000
C:covar 0 NA NA NA d e
R:covar 0 NA NA NA d e
C:R:covar 0 NA NA NA d e
(confounded) 0 373231 NaN NaN
d: df1 reduced due to linear dependence
e: df1 reduced due to non-estimability
This in turn happens because by default, joint_tests()
reduces the covariate using the meanint()
function:
> meanint(dummy$covar)
[1] -0.5176167 1.4823833
> as.tran$linkfun(meanint(dummy$covar))
[1] NaN NaN
Warning messages:
1: In sqrt(mu/alpha) : NaNs produced
2: In asin(sqrt(mu/alpha)) : NaNs produced
That is, the transformation is applied to values outside its valid range. If you use a function that yields values within the interval (0,1), you get less puzzling results:
> joint_tests(mod, cov.reduce = function(x) mean(x) + c(-0.1, 0.1))
model term df1 df2 F.ratio p.value
site 5 373231 0.000 1.0000
C 1 373231 0.000 1.0000
R 2 373231 0.000 1.0000
covar 1 373231 0.000 1.0000
C:R 2 373231 0.000 1.0000
C:covar 1 373231 0.000 1.0000
R:covar 2 373231 0.000 1.0000
C:R:covar 2 373231 0.000 1.0000
BTW, the example is kind of flaky because you use response and covariate values as dimensions in the grid you created. I think you intended to do:
set.seed(1234) # so random numbers are reproducible
dummy <- expand.grid(
site = c("a", "b", "c", "d", "e", "g"),
C = 1:4,
R = c("i", "j", "k")
)
dummy$covar <- runif(72, min = 0, max = 1)
dummy$resp <- runif(72, min = 0, max = 2)
Now if we re-fit the model, we see P values other than 1:
> joint_tests(mod, cov.reduce = \(x) mean(x) + c(-0.1, 0.1))
model term df1 df2 F.ratio p.value
site 5 55 1.038 0.4049
C 1 55 0.728 0.3972
R 2 55 4.993 0.0102
covar 1 55 0.000 0.9948
C:R 2 55 0.623 0.5400
C:covar 1 55 0.661 0.4197
R:covar 2 55 0.513 0.6013
C:R:covar 2 55 0.202 0.8173
The covariate is actually taken into account, but then the results are suppressed because those terms have zero d.f..:
> joint_tests(mod, show0df = TRUE) model term df1 df2 F.ratio p.value note site 5 373231 0.000 1.0000 C 1 373231 0.000 1.0000 R 2 373231 0.000 1.0000 covar 0 NA NA NA d e C:R 2 373231 0.000 1.0000 C:covar 0 NA NA NA d e R:covar 0 NA NA NA d e C:R:covar 0 NA NA NA d e (confounded) 0 373231 NaN NaN d: df1 reduced due to linear dependence e: df1 reduced due to non-estimability
This in turn happens because by default,
joint_tests()
reduces the covariate using themeanint()
function:> meanint(dummy$covar) [1] -0.5176167 1.4823833 > as.tran$linkfun(meanint(dummy$covar)) [1] NaN NaN Warning messages: 1: In sqrt(mu/alpha) : NaNs produced 2: In asin(sqrt(mu/alpha)) : NaNs produced
That is, the transformation is applied to values outside its valid range. If you use a function that yields values within the interval (0,1), you get less puzzling results:
> joint_tests(mod, cov.reduce = function(x) mean(x) + c(-0.1, 0.1)) model term df1 df2 F.ratio p.value site 5 373231 0.000 1.0000 C 1 373231 0.000 1.0000 R 2 373231 0.000 1.0000 covar 1 373231 0.000 1.0000 C:R 2 373231 0.000 1.0000 C:covar 1 373231 0.000 1.0000 R:covar 2 373231 0.000 1.0000 C:R:covar 2 373231 0.000 1.0000
Thank you for the explanation and for correcting my dummy data set. I thought asin.sqrt took [0,1] inclusive as mod
run error-free (I did see NAs for when some values on the original scale were larger than 1.) I am confused why 0 df occurred. Transforming the covar and save it as a column before fitting the model did not return any 0 df. I'm using your revised dummy
data set in the next step.
dummy$covar.t <- asin(sqrt(dummy$covar))
model term df1 df2 F.ratio p.value
site 5 55 1.038 0.4049
C 1 55 0.692 0.4090
R 2 55 5.055 0.0097
covar.t 1 55 0.000 0.9948
C:R 2 55 0.623 0.5402
C:covar.t 1 55 0.661 0.4197
R:covar.t 2 55 0.513 0.6013
C:R:covar.t 2 55 0.202 0.8173
In case covar must be transformed before fitting, would you recommend cov.reduce = \(x) mean(x) + c(-0.1, 0.1)
as the default solution to get the full ANCOVA table? When exploring different options for cov.reduce or cov.keep, I saw very different pairs of F and p for the same factor using the same model.
If you transform first and save it as a predictor, then you are talking about a different model with covar.t
as a predictor instead of covar
. The joint tests are based, respectively, on a symmetric interval on the covar
scale or a symmetric interval on the covar.t
scale, and since the transformation in nonlinear, a symmetric interval on one scale is not symmetric on the other.
I do not have a recommendation as to which is right or which is preferable. And as is discussed on the help page, covariates create complications and there are infinitely many possible results for joint_tests()
depending on what intervals you choose for covar
(or t.covar
). And I suggest that precisely because this is a point of confusion, it is really difficult to explain what it is you are testing with joint_tests()
and so you are probably better off not using it at all.
What may be more meaningful to you might be results like this:
> joint_tests(mod, at = list(covar = c(0.25, 0.5, 0.75)), by = "covar")
covar = 0.25:
model term df1 df2 F.ratio p.value
site 5 55 1.038 0.4049
C 1 55 0.041 0.8394
R 2 55 4.804 0.0119
C:R 2 55 0.562 0.5732
covar = 0.50:
model term df1 df2 F.ratio p.value
site 5 55 1.038 0.4049
C 1 55 0.934 0.3380
R 2 55 4.514 0.0153
C:R 2 55 0.616 0.5439
covar = 0.75:
model term df1 df2 F.ratio p.value
site 5 55 1.038 0.4049
C 1 55 1.311 0.2572
R 2 55 1.566 0.2181
C:R 2 55 0.365 0.6961
which gives you three different anova tables for just the factors, depending on which covariate value is used. Those tests are based on easy-to-understand hypotheses, formulated for those specific covariate values.
Thank you for the details. I was looking for the impact of the covar
on the resp
and wanted to see how the F-value associated with covar
compared to those of the categorical variables, and then decide whether to use the covar
in predicting resp
.
For cases that covar
is transformed with log
, which is non-linear, 0 df does not occur. Is it inappropriate to call joint_tests
without specifying the levels of covar
?
mod5 <- lm(sqrt(resp) ~ site + C * R * log(covar + 0.01), data = dummy)
joint_tests(mod5)
model term df1 df2 F.ratio p.value
site 5 55 1.152 0.3445
C 1 55 1.579 0.2142
R 2 55 1.797 0.1755
covar 1 55 0.040 0.8416
C:R 2 55 0.671 0.5155
C:covar 1 55 1.033 0.3139
R:covar 2 55 0.060 0.9415
C:R:covar 2 55 0.665 0.5186
In this case, though, covar
should not be used, but in my real data set, the covar
had the biggest impact on resp
.
You are asking if joint_tests()
should be used for model selection, and the answer is decidedly no. That function, and indeed the entire emmeans package, is designed to perform post hoc analyses of an existing model. None of it is designed for model selection. And I would never use type 3 tests for model selection anyway. You should use type-1 (anova()
) or type-2 (car::Anova()
) or other methods designed for selecting models.
This series of questions seems to be following a meandering path. We (or at least I) have established that this is not a bug, and now we're not even talking about the same transformation.
It looks like this thread is completed, so am closing
Thanks for correcting. I appreciate your help.
Describe the bug
joint_tests omits covariate transformed with
asin(sqrt())
viamake.tran()
To reproduce
set.seed(83) library(emmeans)
Switching resp and covar, and the joint_tests(mod2) below gave the full ancova model
Expected behavior
A full ANCOVA table similar to the table obtained from
joint_tests(mod2)
.