insightsengineering / rbmi

Reference based multiple imputation R package
https://insightsengineering.github.io/rbmi/
Other
17 stars 6 forks source link

outputs of rbmi::lsmeans() don't match those of emmeans::lsmeans() #412

Closed adcascone closed 8 months ago

adcascone commented 8 months ago

On pg. 70 of the rbmi reference manual, users are directed to the references of lsmeans() for identical implementations of estimating the least square means from a linear model in SAS and in R via the emmeans package.

However, the outputs of rbmi::lsmeans() don't match those of emmeans::lsmeans()


  iris2 <- iris[ iris$Species %in% c("versicolor", "virginica"), ]
  iris2$Species <- factor(iris2$Species)

  emmeans::lsmeans(lm("Sepal.Length ~ Species + Petal.Length*Petal.Width", data=iris2), specs = "Species",data=iris2, weights="proportional")
    #  Species    lsmean     SE df lower.CL upper.CL
    # versicolor   6.49 0.0796 95     6.33     6.65
    # virginica    5.99 0.0768 95     5.84     6.14
  rbmi:::lsmeans(lm("Sepal.Length ~ Species + Petal.Length*Petal.Width", data=iris2), Species = "versicolor",.weights = "proportional")$est[[1]]
    #6.512539
  rbmi:::lsmeans(lm("Sepal.Length ~ Species + Petal.Length*Petal.Width", data=iris2), Species = "virginica",.weights = "proportional")$est[[1]]
    #6.011461

Environment

Thank you, Arianna

gowerc commented 8 months ago

Hi @adcascone,

Thanks for raising this. Digging into this a bit more I don't think this is a bug but more of a "what are you estimating" issue. For reference the discrepancy comes from how we handle interactions between continuous covariates where we use mean(A*B) whilst emmeans uses mean(A) * mean(B). For additional reference it appears that SAS's default is equivalent to what we've implemented however they do provide options for the user to select between the two (see Setting Covariate Values ).

Talking to @wolbersm, if I have understood him correctly, the difference comes down to whether you are wanting to construct a single hypothetical patient (in which case the emmeans approach makes more sense) or if you are wanting to construct a counterfactual e.g. what would have happened if all patients had taken drug vs all patients taking placebo (which is what our current implementation represents).

If you would definitely prefer to use the hypothetical patient approach we could look to add an option to rbmi to allow users to select between the two.

(@wolbersm - Please correct me if I've got any of the details above wrong)

wolbersm commented 8 months ago

Hi both

Thanks @gowerc! Just to small additional clarification to what @gowerc wrote:

rbmi is primarily targeted to the analysis of randomized clinical trials. In this setting it is often useful to report the estimated (counterfactual) population mean outcomes if all subjects had received intervention or control treatment, respectively. This can be done via standardization and, for linear models, equivalently via LSMEANS as per our implementation. See e.g. this slide deck (slide 30f) for further regarding the connection of standadization and LSMEANS: https://baselbiometrics.github.io/home/docs/trainings/20210202/2_Bornkamp.pdf

wolbersm commented 8 months ago

So, while I think our choice is reasonable, it could be useful to document in the help pages that rbmi::lsmeans() may give different results to emmeans::lsmeans() if interaction terms are included in the model. What do you think @gowerc?

gowerc commented 8 months ago

Just to clarify we ended up making the following changes which hopefully address this issue (please re-open if you feel it hasn't been sufficiently addressed)

Hope this helps !