rvlenth / emmeans

Estimated marginal means
https://rvlenth.github.io/emmeans/
364 stars 32 forks source link

Transformations for covariates #506

Closed wdkrnls closed 1 month ago

wdkrnls commented 1 month ago

I'm using nlme::nlme and there because of the nonlinear estimation it seemed like I really needed to standardize the covariates with the scale function in order to get many of my models to fit reliably. I don't remember seeing any way to unscale these X variables in emmeans. I can easily unscale for plots outside emmeans, but I like the reports emmeans makes and would prefer it made those back transformations in house to ease model exploration. In the vignettes you talk about transforming the Y variables a lot (in contrast) and focus on concerns like bias adjustment, while I feel fixing X just involves some algebra to rescale the X so I can express at=list(...) statements on their experimentally meaningful scales. Did I overlook something?

I suppose what I am looking for an argument which takes a named list of functions, where the name is the name of the transformed variable in the reference grid and the function inverts the transformation from the perspective of the values reported in the emmeans/emmip/emtrends/etc... output.

Somewhat intuitively related in my head is the section about the poly command for making sure the reference grid doesn't hold two columns for linked variables. However, that particular worry isn't really a valid concern in this application.

Cheers!

rvlenth commented 1 month ago

I wonder if you just tried it... Here's an example, albeit not with nlme:

> mod = lm(strength ~ scale(diameter) * machine, data = fiber)

> ref_grid(mod)
'emmGrid' object with variables:
    diameter = 24.133
    machine = A, B, C

> ref_grid(mod, at = list(diameter = c(15, 25, 35)))
'emmGrid' object with variables:
    diameter = 15, 25, 35
    machine = A, B, C

> summary(.Last.value)
 diameter machine prediction    SE df
       15 A             30.1 2.113  9
       25 A             41.2 0.750  9
       35 A             52.2 2.040  9
       15 B             33.8 2.573  9
       25 B             42.3 0.782  9
       35 B             50.9 2.149  9
       15 C             30.6 1.492  9
       25 C             39.3 1.089  9
       35 C             47.9 2.967  9

When you have transformations of covariates in the model formula, they are generally handled pretty transparently as in the above. Maybe I don't understand what is needed here, and maybe there is something peculiar to nlme? Perhaps if you construct a simple (preferably small) reproducible example?

rvlenth commented 1 month ago

PS -- I tried to make a simple example based on the one provided for nlme:

> fm2 <- nlme(height ~ SSasymp(age, Asym, R0, lrc),
+             data = Loblolly,
+             fixed = list(Asym ~ scale(age), R0 + lrc ~ 1),
+             random = Asym ~ 1)

However, my fixed spec seems to have been completely ignored:

> fm2
Nonlinear mixed-effects model fit by maximum likelihood
  Model: height ~ SSasymp(age, Asym, R0, lrc) 
  Data: Loblolly 
  Log-likelihood: -114.743
  Fixed: list(Asym ~ 1, R0 ~ 1, lrc ~ 1) 
      Asym         R0        lrc 
101.448300  -8.627494  -3.233730 

Random effects:
 Formula: Asym ~ 1 | Seed
            Asym  Residual
StdDev: 3.650645 0.7188624

Number of Observations: 84
Number of Groups: 14 

Note it says Fixed: list(Asym ~ 1, R0 ~ 1, lrc ~ 1) which is not what I specified! I wonder if this is really the barrier you are up against.

wdkrnls commented 1 month ago

Hmm... I was using the pre-transformed column. I will try with scale directly in the formula. That seems consistent with the intuitive connection to poly and inverse in the data set. As I mentioned, nlme is quite fickle about when it can estimate parameters. In this case I can get around the issue you observed by updating the intercept model.

library(nlme)
library(emmeans)

set.seed(42)
Loblolly2 <- Loblolly
Loblolly2$covariate <- rnorm(Loblolly2$age, mean = 25, sd = 5)

fm2 <- nlme(
  model = height ~ SSasymp(age, Asym, R0, lrc),
  data = Loblolly2,
  fixed = list(Asym + R0 + lrc ~ 1),
  random = pdDiag(Asym ~ 1))

fm3 <- update(fm2,
    fixed = list(Asym ~ scale(covariate), R0 + lrc ~ 1),
    start = c(fixef(fm2)[1], 0, fixef(fm2)[2:3]))

emmeans(fm3, ~ covariate, param = "Asym")

This gives me nonEst in emmeans presumably because of the small data set, but the covariate average is on the original scale as you suggested.

Thanks!

wdkrnls commented 1 month ago

Now, here is the rub, I also want to find polynomials of these scaled terms. I haven't checked whether that is feasible yet from the perspective of the reporting.

rvlenth commented 1 month ago

It sounds like you are trying to do a lot with models that you can hardly succeed in fitting. But it shouldn't be a problem as long as you don't hard-code any predictors. Use poly(scale(x)), i.e., scale the covariate, not the polynomial terms.

wdkrnls commented 1 month ago

Would this also just work if I wrote an spoly wrapper around poly and scale that did the same as poly(scale(covariate), 2) but which doesn't look like a complex expression? Or are these specially supported cases? I ask because I also want to write procedures around the model terms and don't want to have to parse arbitrary R code. Dealing with one set of parens per term, however, is manageable.

rvlenth commented 1 month ago

It'll probably work, but I don't think two sets of nested parentheses are that onerous, and the fancier you get, the more can go wrong.

rvlenth commented 1 month ago

BTW, in case this isn't obvious, you use emtrends(..., max.degree = 2) to estimate the polynomial trends.

wdkrnls commented 1 month ago

Thanks, for your help. I think you are right. I'm actually having a tangential problem with emmip where it doesn't seem to always properly recognize the param= even though it's in the nlme formula and names(model$plist)). However, that seems like a different issue.

rvlenth commented 1 month ago

I'm closing this issue, as I think it is resolved.