rvlenth / emmeans

Estimated marginal means
https://rvlenth.github.io/emmeans/
348 stars 31 forks source link

Get trends for different degrees of polynomial #133

Closed singmann closed 5 years ago

singmann commented 5 years ago

When fitting a model with poly() term, one can still use emtrends to get the trends of the variable. However, it seems this only returns the linear trend (i.e., polynomial of degree 1). I would like to get the trends for the higher degree polynomial terms. Example follows:

library("emmeans")
afex::set_sum_contrasts()

mtcars$amf <- factor(mtcars$am)

m1 <- lm(mpg ~ amf*poly(hp, 2), mtcars)
summary(m1)

emtrends(m1, "amf", var = "hp")
#  amf hp.trend     SE df lower.CL upper.CL
#  0    -0.0644 0.0130 26  -0.0912  -0.0377
#  1    -0.0846 0.0139 26  -0.1132  -0.0561
# 
# Confidence level used: 0.95 

This looks to me like the linear trend of hp. However, I would like to get the quadratic trends as well.

PS: Sorry for submitting an unfinished issue at first.

rvlenth commented 5 years ago

True -- emtrends computes just linear trends. It's possible to get a test of the quadtratic trend like this:

> emm = emmeans(m1, ~ amf * hp, at = list(hp = -1:1))
> contrast(emm, "poly", by = "amf")
amf = 0:
 contrast   estimate       SE df t.ratio p.value
 linear    -0.261818 0.146995 26 -1.781  0.0866 
 quadratic  0.000449 0.000454 26  0.990  0.3315 

amf = 1:
 contrast   estimate       SE df t.ratio p.value
 linear    -0.357789 0.097427 26 -3.672  0.0011 
 quadratic  0.000637 0.000253 26  2.513  0.0185 

Hope that helps

Russ

singmann commented 5 years ago

Hmm, I am not sure I fully understand the scaling or unit of this output. It does not match the output above which seems to be on the original hp scale, This can be seen by implementing this model by hand without poly:

library("tidyverse")
mtcars <- mtcars %>% 
  mutate(hp_c = hp - mean(hp)) %>% 
  mutate(hp_c2 = hp_c^2)

m2 <- lm(mpg ~ amf*(hp_c + hp_c2), mtcars)
emtrends(m2, "amf", var = "hp_c")
#  amf hp_c.trend     SE df lower.CL upper.CL
#  0      -0.0651 0.0133 26  -0.0924  -0.0377
#  1      -0.0855 0.0142 26  -0.1146  -0.0564
# 
# Confidence level used: 0.95

This matches exactly the output from emtrends on m1 if an appropriately small value of delta.var is chosen (also suggesting that the default value of delta.var might be a tad too big in this case):

emtrends(m1, "amf", var = "hp", delta.var = 0.001)
#  amf hp.trend     SE df lower.CL upper.CL
#  0    -0.0651 0.0133 26  -0.0924  -0.0377
#  1    -0.0855 0.0142 26  -0.1146  -0.0564
# 
# Confidence level used: 0.95 

Likewise, the quadratic trends also do not match this new output.

emtrends(m2, "amf", var = "hp_c2")
#  amf hp_c2.trend       SE df  lower.CL upper.CL
#  0      0.000224 0.000227 26 -0.000242 0.000691
#  1      0.000318 0.000127 26  0.000058 0.000579
# 
# Confidence level used: 0.95
rvlenth commented 5 years ago

Note that I said, "It's possible to get a test of ..." -- never claimed it'd estimate them on the right scale. What you have is estimates of contrasts of the means, which you can see via:

> emmeans:::poly.emmc(1:3)
  linear quadratic
1     -1         1
2      0        -2
3      1         1

If you scale the same contrasts differently, you get the results you just showed:

contrast(emm, list(lin = c(-1,0,1)/2, quad = c(1,-2,1)/2), by = "amf")
amf = 0:
 contrast  estimate       SE df t.ratio p.value
 lin      -0.065045 0.013288 26 -4.895  <.0001 
 quad      0.000224 0.000227 26  0.990  0.3315 

amf = 1:
 contrast  estimate       SE df t.ratio p.value
 lin      -0.085512 0.014158 26 -6.040  <.0001 
 quad      0.000318 0.000127 26  2.513  0.0185

I'll consider what might be done to expand emtrends beyond just linear trends, but it doesn't seem like the most compelling unmet need at this time.

singmann commented 5 years ago

Thanks for the clarification and the new code. This is exactly what I was looking for. So this issue could be closed.

But for what it is worth, there have been several times in the last year where I would have needed this feature. So either I am performing somewhat unusual analyses or other people who would like that feature are just not vocal about it. Anyway, easier access to this functionality would be much appreciated.

rvlenth commented 5 years ago

Henrik,

I'll leave it open, as this is something to work toward. May I ask...

  1. How high a degree is needed? Do you ever look at cubic, quartic, quintic trends?
  2. Is it better to combine the different degrees in one grid (e.g. in your example have the linear and quadratic trends combined in there delineated by a degree factor), or to obtain the different degrees in separate calls?
singmann commented 5 years ago

Hi Russ,

Thanks for considering that further.

  1. I occasionally look at 3rd degree polynomials. For a current project (with tons of data) I even looked at 4h and 5th degree, but not really more. But from view of the (psychology) literature, I would expect the main interest is in degrees 1 to 3. Most people know that with higher degrees things are quite unpredictable.
  2. I think it makes some sense to have separate calls for separate degrees. For example, one common question is, for which factor level are the linear effects significant and how are they. And then, for which levels are the quadratic effects significant. So having emtrends(m1, "amf", var = "hp", degree = 1) and emtrends(m1, "amf", var = "hp", degree = 2) seems like a great idea. The only issue I can see with this separation is if I want simultaneous p-value adjustment across both degrees.
DominiqueMakowski commented 5 years ago

Following this thread with interest, as indeed this feature could be very useful.

The only issue I can see with this separation is if I want simultaneous p-value adjustment across both degrees.

Adding my two cents here; maybe the degree arg could accept a vector: emtrends(m1, "amf", var = "hp", degree = c(1, 2))

rvlenth commented 5 years ago

FWIW, I realized there's a straightforward way to get the scaling right for the quadratic trend: Just do a difference quotient on the slopes at two slightly different values of the covariate:

> emt <- emtrends(m1, ~ hp | amf, "hp", at = list(hp = c(99.999, 100)))
> contrast(emt, list(quad.trend = c(-1,1)*500))
amf = 0:
 contrast   estimate       SE df t.ratio p.value
 quad.trend 0.000224 0.000227 26 0.990   0.3315 

amf = 1:
 contrast   estimate       SE df t.ratio p.value
 quad.trend 0.000318 0.000127 26 2.513   0.0185
rvlenth commented 5 years ago

OK, I have something that seems to work. More testing needed before I push it to github; you'll get a notice when I do.

> emtrends(m1, "amf", "hp", max.order = 2)
 amf  hp order   hp.trend       SE df  lower.CL  upper.CL
 0   147 first  -0.064987 0.013262 26 -9.22e-02 -0.037727
 1   147 first  -0.085430 0.014134 26 -1.14e-01 -0.056377
 0   147 second  0.000224 0.000227 26 -2.42e-04  0.000691
 1   147 second  0.000318 0.000127 26  5.79e-05  0.000579

Confidence level used: 0.95 
rvlenth commented 5 years ago

I think I have this working correctly, and added a patch to make it work for multivariate models as well. So am closing this issue. But feel free to comment if anything comes up.

(PS -- I might change "order" to "degree"; my thinking for "order" was the order of the derivative being computed; but "degree" makes sense in terms of linear, quadratic, cubic trends.)

CeresBarros commented 2 years ago

I know this is an old thread, I'll give it a go at reporting what seems to be a bug.

The above code works fine using poly within an lm framework, but fails in a gls framework:

library(nlme)
mtcars$amf <- factor(mtcars$am)

m1 <- lm(mpg ~ amf*poly(hp, 2), mtcars)
m2 <- gls(mpg ~ amf*poly(hp, 2),
          weights = varIdent(form = ~ 1|amf), data = mtcars)

> emtrends(m1, "amf", "hp", max.order = 2)
 amf hp.trend     SE df lower.CL upper.CL
 0    -0.0650 0.0133 26  -0.0922  -0.0377
 1    -0.0854 0.0141 26  -0.1145  -0.0564

> emtrends(m2, "amf", "hp", max.order = 2)
Error in poly(hp, 2) : 'degree' must be less than number of unique points

Any chance someone could help here?

Thanks

rvlenth commented 2 years ago

Thanks - I'll look at this when I can. Lots of other stuff going on right now... Russ

rvlenth commented 2 years ago

The difficulty is this. With m1,

> terms(m1)
mpg ~ amf * poly(hp, 2)
attr(,"variables")
list(mpg, amf, poly(hp, 2))
attr(,"factors")
            amf poly(hp, 2) amf:poly(hp, 2)
mpg           0           0               0
amf           1           0               1
poly(hp, 2)   0           1               1
attr(,"term.labels")
[1] "amf"             "poly(hp, 2)"     "amf:poly(hp, 2)"
attr(,"order")
[1] 1 1 2
attr(,"intercept")
[1] 1
attr(,"response")
[1] 1
attr(,".Environment")
<environment: R_GlobalEnv>
attr(,"predvars")
list(mpg, amf, poly(hp, 2, coefs = list(alpha = c(146.6875, 198.071514938048
), norm2 = c(1, 32, 145726.875, 977168457.086594))))
attr(,"dataClasses")
        mpg         amf poly(hp, 2) 
  "numeric"    "factor" "nmatrix.2" 

while with m2,

> terms(m2)
mpg ~ amf * poly(hp, 2)
attr(,"variables")
list(mpg, amf, poly(hp, 2))
attr(,"factors")
            amf poly(hp, 2) amf:poly(hp, 2)
mpg           0           0               0
amf           1           0               1
poly(hp, 2)   0           1               1
attr(,"term.labels")
[1] "amf"             "poly(hp, 2)"     "amf:poly(hp, 2)"
attr(,"order")
[1] 1 1 2
attr(,"intercept")
[1] 1
attr(,"response")
[1] 1
attr(,".Environment")
<environment: 0x000001b2108b2f50>

Note that the terms component of m1 has an attr(,"predvars") component that gives information from the Gram-Schmidt stuff needed to recreate the polynomial, and that is lacking in the terms for m2. Without that, we can't do much, and in particular, the model.matrix function doesn't even know that a polynomial was even fitted previously, which is why you get the error message.

The hopelessnessw of this is accentuated by the fact that nlme itself can't handle this model:

> newdata = ref_grid(m1)@grid   # just a quick way to create a new dataset
> newdata
  amf       hp .wgt.
1   0 146.6875    19
2   1 146.6875    13

> predict(m1, new = newdata)
       1        2 
17.37105 20.49409 

> predict(m2, new = newdata)
Error in poly(hp, 2) : 'degree' must be less than number of unique points

So I'm afraid that's [pretty much the end of the road, unless I put in some sort of messy hack that looks for poly() terms and gets the needed special sauce from lm() or somewhere.

rvlenth commented 2 years ago

PS - this is really a bug in nlme. I can report it, but my experience is that nobody ever responds to bug reports for that package.

CeresBarros commented 2 years ago

Thank you so much for looking into this! I hadn't noticed the bug with nlme, as I was extracting fitted values only. This is worrying! I will definately report it, even if it falls on deaf ears...

I ended up trying something like this, but am unsure whether this is statistically correct.

> mtcars$hp_poly1 <- poly(mtcars$hp, 2)[, 1]
> mtcars$hp_poly2 <- poly(mtcars$hp, 2)[, 2]
> m3 <- gls(mpg ~ amf*(hp_poly1 + hp_poly2), 
+           weights = varIdent(form = ~ 1|amf), data = mtcars)
> identical(predict(m2), predict(m3))
[1] TRUE
> newdata <- ref_grid(m3)@grid   # just a quick way to create a new dataset
> newdata
  amf hp_poly1     hp_poly2    .wgt.
1   0        0 7.024021e-18 30.16073
2   1        0 7.024021e-18 13.00000
> predict(m3, new = newdata) ## now able to predict
[1] 18.39336 21.94351
attr(,"label")
[1] "Predicted values"
> emtrends(m3, specs = "amf", var = "hp_poly1", mode = "df.error")
 amf hp_poly1.trend   SE df lower.CL upper.CL
 0            -20.4 3.99 24    -28.7    -12.2
 1            -26.4 4.91 24    -36.5    -16.3

Degrees-of-freedom method: df.error 
Confidence level used: 0.95 
> emtrends(m3, specs = "amf", var = "hp_poly2", mode = "df.error")
 amf hp_poly2.trend   SE df lower.CL upper.CL
 0             7.02 5.63 24   -4.610     18.6
 1             9.95 4.99 24   -0.354     20.3

Degrees-of-freedom method: df.error 
Confidence level used: 0.95 
rvlenth commented 2 years ago

I think this will NOT work because the increments in the difference quotients are different for the two degrees, and neither of them right w.r.t. hp. However, you can specify a model like amf*(hp + I(hp^2)) and that will be OK (though not as well-conditioned).

CeresBarros commented 2 years ago

Thanks for your help.

The issue with fitting a model as amf*(hp + I(hp^2)) is that hp and I(hp^2) are not orthogonal (which is ensured by poly(..., raw = FALSE)) -- maybe that is what you meant by "not as well conditioned". I'll have to think about this more carefully.

rvlenth commented 2 years ago

Centering or nearly centering will improve the numerical conditioning a lot -- e.g., ~ amf*(I(hp - 150) + I((hp - 150)^2)). But the really important thing is you actually have to have the variable hp in both terms if you want the estimated trends to be correct.

R uses pretty high-quality code for fitting linear models, so numerical conditioning is not too big a deal. If you were fitting up to hp^4 or beyond, that'd be a different story.

CeresBarros commented 2 years ago

This is great help, thanks!

CeresBarros commented 1 year ago

@rvlenth, I'm reactivating this thread because our previous discussion (and subsequent stackoverflow post) resulted in nlme maintainers fixing the nlme bug (🎉), where attr(, "prevars") was missing from a gls model object's terms .

However, it still seems like there's something amiss when I try to run emtrends on a gls with a polynomial term:

I'll post the same reprex as before:

> library(nlme)
> library(emmeans)
Warning message:
package ‘emmeans’ was built under R version 4.2.1 
> mtcars$amf <- factor(mtcars$am)
> m1 <- lm(mpg ~ amf*poly(hp, 2), mtcars)
> m2 <- gls(mpg ~ amf*poly(hp, 2),
+           weights = varIdent(form = ~ 1|amf), data = mtcars)
> attr(terms(m1), "predvars")
list(mpg, amf, poly(hp, 2, coefs = list(alpha = c(146.6875, 198.071514938048
), norm2 = c(1, 32, 145726.875, 977168457.086594))))
> attr(terms(m2), "predvars")
list(mpg, amf, poly(hp, 2, coefs = list(alpha = c(146.6875, 198.071514938048
), norm2 = c(1, 32, 145726.875, 977168457.086594))))
> emtrends(m1, "amf", "hp", max.order = 2)
 amf hp.trend     SE df lower.CL upper.CL
 0    -0.0650 0.0133 26  -0.0922  -0.0377
 1    -0.0854 0.0141 26  -0.1145  -0.0564

Confidence level used: 0.95 
> emtrends(m2, "amf", "hp", max.order = 2)
Error in poly(hp, 2) : 'degree' must be less than number of unique points

Here's my session info as well:

> sessionInfo()
R version 4.2.0 (2022-04-22 ucrt)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 18363)

Matrix products: default

locale:
[1] LC_COLLATE=English_Canada.utf8  LC_CTYPE=English_Canada.utf8    LC_MONETARY=English_Canada.utf8 LC_NUMERIC=C                   
[5] LC_TIME=English_Canada.utf8    

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] emmeans_1.8.1-1 nlme_3.1-157   

loaded via a namespace (and not attached):
[1] compiler_4.2.0     estimability_1.4.1 tools_4.2.0        mvtnorm_1.1-3      grid_4.2.0         xtable_1.8-4      
[7] lattice_0.20-45   
rvlenth commented 1 year ago

Well, this puts me between a rock and a hard place. I can fix this, but it creates another error:

> emtrends(m2, "amf", "hp", max.order = 2)
Error in model.matrix.default(eval(call$model), data = data) : 
  model frame and formula mismatch in model.matrix()
 Error: Can't estimate Satterthwaite parameters.
Try adding the argument 'mode = "appx-satterthwaite"'

If I do what it says...

> emtrends(m2, "amf", "hp", max.order = 2, mode = "appx")
 Error: Can't estimate Satterthwaite parameters.
Try adding the argument 'mode = "df.error"'

If I do what that says,

> emtrends(m2, "amf", "hp", max.order = 2, mode = "df.error")
 amf hp.trend     SE df lower.CL upper.CL
 0    -0.0650 0.0105 25  -0.0867  -0.0433
 1    -0.0854 0.0178 25  -0.1221  -0.0487

Degrees-of-freedom method: df.error 
Confidence level used: 0.95 

So fixing this problem with poly terms creates errors for all gls models when we use Satterthwaite or approximate Satterthwaite. That is way too high a price, unless I can think of some other workaround.

CeresBarros commented 1 year ago

I appreciate you taking the time to look at this again, @rvlenth (and I wish I could move the rock away ;) !). My knowledge of stats is at its limits wrt to this, am I to understand that the mode = "df.error" solution is not resulting in good estimates?

rvlenth commented 1 year ago

This seems to be the best I can do so far...

> emtrends(m2, "amf", "hp", max.order = 2)
Note: Satterthwaite d.f. unavailable when model r.h.s. involves functions
 amf hp.trend     SE df lower.CL upper.CL
 0    -0.0650 0.0105 25  -0.0867  -0.0433
 1    -0.0854 0.0178 25  -0.1221  -0.0487

Degrees-of-freedom method: df.error 
Confidence level used: 0.95 

(That is, we force away from Satterthwaite methods.)

You may use a specified non-Satterthwaite mode, or specify the df manually, e.g.

> emtrends(m2, "amf", "hp", max.order = 2, df = 12)
 amf hp.trend     SE df lower.CL upper.CL
 0    -0.0650 0.0105 12  -0.0879  -0.0420
 1    -0.0854 0.0178 12  -0.1243  -0.0466

Degrees-of-freedom method: user-specified 
Confidence level used: 0.95 

There is nothing wrong with using mode = "df.error" as far as estimation is concerned. The estimates are not "bad"; we just may be off in estimating the associated degrees of freedom.

rvlenth commented 1 year ago

Hmmmm, well I am back to square one. It turns out that this is something emtrends()-specific. After disabling the trap I just implemented, I do fine with orinary emmeans with Satterthwaite:

> emmeans(m2, ~amf|hp, at=list(hp=c(142,144)))
hp = 142:
 amf emmean    SE df lower.CL upper.CL
 0     17.7 0.683 16     16.2     19.1
 1     20.9 1.624 10     17.3     24.5

hp = 144:
 amf emmean    SE df lower.CL upper.CL
 0     17.5 0.687 16     16.1     19.0
 1     20.7 1.650 10     17.1     24.4

Degrees-of-freedom method: satterthwaite 
Confidence level used: 0.95 

So now I really don't see what the problem is.

rvlenth commented 1 year ago

By the way, the correct argument is max.degree, not max.order, and it doesn't make sense to use it if you don't include degree as a variable of interest. With model m1, we have

> emtrends(m1, ~ amf | degree, "hp", max.degree = 2)
degree = linear:
 amf  hp.trend       SE df  lower.CL  upper.CL
 0   -0.065114 0.013319 26 -9.25e-02 -0.037736
 1   -0.085610 0.014188 26 -1.15e-01 -0.056446

degree = quadratic:
 amf  hp.trend       SE df  lower.CL  upper.CL
 0    0.000224 0.000227 26 -2.42e-04  0.000691
 1    0.000318 0.000127 26  5.79e-05  0.000579

Confidence level used: 0.95 
rvlenth commented 1 year ago

OK, maybe I have it now...

> emtrends(m2, ~ amf | degree, "hp", max.degree = 2)
degree = linear:
 amf  hp.trend      SE df  lower.CL  upper.CL
 0   -0.065114 0.01058 16 -8.75e-02 -0.042683
 1   -0.085610 0.01789 10 -1.25e-01 -0.045745

degree = quadratic:
 amf  hp.trend      SE df  lower.CL  upper.CL
 0    0.000224 0.00018 16 -1.58e-04  0.000607
 1    0.000318 0.00016 10 -3.76e-05  0.000674

Degrees-of-freedom method: satterthwaite 
Confidence level used: 0.95 

> emtrends(m2, ~ amf | degree, "hp", max.degree = 2, mode = "appx")
degree = linear:
 amf  hp.trend      SE    df  lower.CL  upper.CL
 0   -0.065114 0.01058 15.89 -8.76e-02 -0.042671
 1   -0.085610 0.01789  9.96 -1.25e-01 -0.045725

degree = quadratic:
 amf  hp.trend      SE    df  lower.CL  upper.CL
 0    0.000224 0.00018 15.89 -1.58e-04  0.000607
 1    0.000318 0.00016  9.96 -3.78e-05  0.000674

Degrees-of-freedom method: appx-satterthwaite 
Confidence level used: 0.95 

It was a very subtle issue with the way data were passed from emtrends() that messed-up the Satterthwaite setup.

CeresBarros commented 1 year ago

Wonderful! Thanks so much @rvlenth ! Glad that in the end the issue didn't require a huge change :)