rvlenth / emmeans

Estimated marginal means
https://rvlenth.github.io/emmeans/
358 stars 31 forks source link

Using lsmeans() with gls() object closes R gui when using mode = "satterthwaite" #465

Closed jdbible closed 8 months ago

jdbible commented 8 months ago

Please ensure that you are in the right place

I hope this is the right place, my issue is with a bug in the lsmeans() function. So, if this is the wrong place any help would be appreciated.

Describe the bug

When attempting to use the mode = "satterthwaite", option with a gls() object, the R gui literally closes, doesn't hang up, spit an error or warning, it just closes the GUI.

To reproduce

Toy example for multiple comparisons with unequal variances

1/26/24

# library(nlme) library(emmeans) set.seed(12) y<- rnorm(30, rep(c(1,2,10), each=10), rep(c(.1, 1,10), each=10)) group<- rep(paste("group", 1:3), each=10)

Creates the gls model object.

mod1<- gls(y ~ group, weights= varIdent(form=~1 |group)) summary(mod1)

This doesn't run. But it does make some suggestions to get it running.

lsmeans(mod1, "group")

This does run, using error degrees of freedom.

lsmeans(mod1, "group", mode = "df.error")

Make sure you save what you are working on before you run this one. The "satterthwaite" method literally closes the R GUI!!!

lsmeans(mod1, "group", mode = "satterthwaite")

lsmeans(mod1, "group", mode = "appx-satterthwaite")

Expected behavior

I was trying to get Satterwaite type df and variance estimates for comparisons. If algorithmically it doesn't want to, or can't, a warning or an error would have been nice. But the GUI just disappeared "poof", gone, bye bye, straight up Houdini'd itself out of there.

Additional context

Apologies for being the guy to send one of these in. Usually, if a function does something weird I just shrug it off, but I've never seen a bug that will close the GUI. The emmeans package really makes my life easier most days of the week. The only reason I even made a report is because it does close the GUI (I've verified this on 4 different machines).

Ground rules

rvlenth commented 8 months ago

I do not get this drastic behavior; I just get an error message:

> lsmeans(mod1, "group", mode = "satterthwaite")
Error in oXy[, p + 1] : subscript out of bounds
Error: Can't estimate Satterthwaite parameters.
  Try adding the argument 'mode = "appx-satterthwaite"'

> lsmeans(mod1, "group", mode = "appx-satterthwaite")
Error: Can't estimate Satterthwaite parameters.
Try adding the argument 'mode = "df.error"'

However, I continued with the call with mode = df.error, and that's when it faulted for me. Appaently there's something that creates a memory leak or something, but it doesn't blow up until it does a garbage collect or something.

Nothing worse than a spurious, unpredictable error. I'll see if I can pin it down...

jdbible commented 8 months ago

Thanks, apologies but it was weird, I had never seen that happen. Thanks for looking into it.

rvlenth commented 8 months ago

OK, I discovered something. It turns out that the Satterthwaite code requires an explicit data argument in the call for fitting the model. With your mod1, the y variable comes out as NULL in the code. That does not directly cause an error, but it makes the dimensions of some arrays wrong, and then when some internal routines are called, I think it causes some stuff to be written to the wrong places in memory -- and that is a very bad thing!

I figured out how to make it play nice either way. (And moreover, the Satterthwaite method actually works.) The following reprex fits models with different-named variables, with and without an explicit dataset. I made the names different to avoid false successes with retrieving the right result from the wrong place.

library(nlme)
library(emmeans)
set.seed(12)
y <- rnorm(30, rep(c(1, 2, 10), each = 10), rep(c(.1, 1, 10), each = 10))
group<- rep(paste("group", 1:3), each=10)

dat <- data.frame(yy = y, gg = group)

mod1 <- gls(y ~ group, weights = varIdent(form = ~1 | group))
emmeans(mod1, "group", mode = "sat")
##  group   emmean     SE df lower.CL upper.CL
##  group 1  0.953 0.0316  9    0.882     1.02
##  group 2  1.805 0.2332  9    1.277     2.33
##  group 3 12.088 2.6014  9    6.203    17.97
## 
## Degrees-of-freedom method: satterthwaite 
## Confidence level used: 0.95

mod2 <- gls(yy ~ gg, weights = varIdent(form = ~1 | gg), data = dat)
emmeans(mod2, "gg", mode = "sat")
##  gg      emmean     SE df lower.CL upper.CL
##  group 1  0.953 0.0316  9    0.882     1.02
##  group 2  1.805 0.2332  9    1.277     2.33
##  group 3 12.088 2.6014  9    6.203    17.97
## 
## Degrees-of-freedom method: satterthwaite 
## Confidence level used: 0.95

Created on 2024-01-26 with reprex v2.0.2

The fixed version will be on GitHub soon.

Alas...

But yet mode to do. The approx Satterthwaite method usually works pretty well

> emmeans(mod2, "gg", mode = "appx")
 gg      emmean     SE   df lower.CL upper.CL
 group 1  0.953 0.0316 8.50    0.881     1.03
 group 2  1.805 0.2332 8.99    1.277     2.33
 group 3 12.088 2.6014 9.00    6.203    17.97

Degrees-of-freedom method: appx-satterthwaite 
Confidence level used: 0.95 

but again seems to have the same limitation when local data are used. But in this case, it just doesn't work. My tests did not have it create a fault.

jdbible commented 8 months ago

My man! You're good, thanks for looking at this.

rvlenth commented 8 months ago

OK, we now have appx-satterthwaite working either way:

library(nlme)
library(emmeans)
set.seed(12)
y <- rnorm(30, rep(c(1, 2, 10), each = 10), rep(c(.1, 1, 10), each = 10))
group<- rep(paste("group", 1:3), each=10)

dat <- data.frame(yy = y, gg = group)

mod1 <- gls(y ~ group, weights = varIdent(form = ~1 | group))
mod2 <- gls(yy ~ gg, weights = varIdent(form = ~1 | gg), data = dat)

set.seed(23.0126)
emmeans(mod2, "gg", mode = "appx")
##  gg      emmean     SE   df lower.CL upper.CL
##  group 1  0.953 0.0316 8.97    0.882     1.02
##  group 2  1.805 0.2332 9.01    1.277     2.33
##  group 3 12.088 2.6014 9.00    6.203    17.97
## 
## Degrees-of-freedom method: appx-satterthwaite 
## Confidence level used: 0.95

set.seed(23.0126)
emmeans(mod1, "group", mode = "appx")
##  group   emmean     SE   df lower.CL upper.CL
##  group 1  0.953 0.0316 8.97    0.882     1.02
##  group 2  1.805 0.2332 9.01    1.277     2.33
##  group 3 12.088 2.6014 9.00    6.203    17.97
## 
## Degrees-of-freedom method: appx-satterthwaite 
## Confidence level used: 0.95

Created on 2024-01-26 with reprex v2.0.2

(I used the same random seed to verify we get the same sim-based results)

rvlenth commented 8 months ago

PS -- I use some of the same code for lme, so need to check that too. Then I'll push up the changes.

jdbible commented 8 months ago

Give me until tomorrow, my wife needs some attention. After that I'll work on the vetting.

Seriously, thank you. I haven't ever broken R, like that ( and I'm not a grad student trying to get this to do my homework for me), and this was one of those where I was admittedly stretching a package at it's seams. But because it acted right most of the time I had faith I could get it to do it. But ... you know how that works...

jdbible commented 8 months ago

That did it, it's up and running with an explicit "data=" in the gls() call. Thank you very much, should I do anything on my end to close out the report?

rvlenth commented 8 months ago

Here's a reprex for the lme part of it with "appx-satterthwaite". An additional piece needed (that occurs in this example) is rokarounds when some of the simulations lead to parameters being dropped.

library(nlme)
library(emmeans)
set.seed(12)
y <- rnorm(30, rep(c(1, 2, 10), each = 10), rep(c(.1, 1, 10), each = 10))
group <- rep(paste("group", 1:3), each = 10)
subj <- rep(LETTERS[1:6], each = 5)

dat <- data.frame(yy = y, gg = group, ss = subj)
mod3 <- lme(y ~ group, random = ~1 | subj, weights = varIdent(form = ~1 | group))
mod4 <- lme(yy ~ gg, random = ~1 | ss, weights = varIdent(form = ~1 | gg), data = dat)

set.seed(23.0126)
emmeans(mod4, "gg", mode = "appx")
##  gg      emmean     SE  df lower.CL upper.CL
##  group 1  0.953 0.0316 8.8    0.881     1.03
##  group 2  1.805 0.2332 9.0    1.277     2.33
##  group 3 12.088 2.6014 9.0    6.203    17.97
## 
## Degrees-of-freedom method: appx-satterthwaite 
## Confidence level used: 0.95

set.seed(23.0126)
emmeans(mod3, "group", mode = "appx")
##  group   emmean     SE  df lower.CL upper.CL
##  group 1  0.953 0.0316 8.8    0.881     1.03
##  group 2  1.805 0.2332 9.0    1.277     2.33
##  group 3 12.088 2.6014 9.0    6.203    17.97
## 
## Degrees-of-freedom method: appx-satterthwaite 
## Confidence level used: 0.95

Created on 2024-01-27 with reprex v2.0.2

Thanks for reporting this. I think the Satterthwaite code is now a lot more robust. BTW, Frank Satterthwaite was an Iowa (where I am) graduate in 1943 and his d.f. method is about his only contribution.

I'm gonna close this issue as I think we've tied it up. But re-open it if you see any gaps.

jdbible commented 8 months ago

Thanks again Russ, I’m at Clemson, I work in Chris McMahan’s department. He’s one of Josh Tebbs (Iowa grad) students. It really is a small world we live in 😊

From: Russell V. Lenth @.> Sent: Saturday, January 27, 2024 11:19 AM To: rvlenth/emmeans @.> Cc: Joseph Daniel Bible @.>; Author @.> Subject: Re: [rvlenth/emmeans] Using lsmeans() with gls() object closes R gui when using mode = "satterthwaite" (Issue #465)

This Message Is From An External Sender: Use caution when opening links or attachments if you do not recognize the sender.

Here's a reprex for the lme part of it with "appx-satterthwaite". An additional piece needed (that occurs in this example) is rokarounds when some of the simulations lead to parameters being dropped.

library(nlme)

library(emmeans)

set.seed(12)

y <- rnorm(30, rep(c(1, 2, 10), each = 10), rep(c(.1, 1, 10), each = 10))

group <- rep(paste("group", 1:3), each = 10)

subj <- rep(LETTERS[1:6], each = 5)

dat <- data.frame(yy = y, gg = group, ss = subj)

mod3 <- lme(y ~ group, random = ~1 | subj, weights = varIdent(form = ~1 | group))

mod4 <- lme(yy ~ gg, random = ~1 | ss, weights = varIdent(form = ~1 | gg), data = dat)

set.seed(23.0126)

emmeans(mod4, "gg", mode = "appx")

gg emmean SE df lower.CL upper.CL

group 1 0.953 0.0316 8.8 0.881 1.03

group 2 1.805 0.2332 9.0 1.277 2.33

group 3 12.088 2.6014 9.0 6.203 17.97

Degrees-of-freedom method: appx-satterthwaite

Confidence level used: 0.95

set.seed(23.0126)

emmeans(mod3, "group", mode = "appx")

group emmean SE df lower.CL upper.CL

group 1 0.953 0.0316 8.8 0.881 1.03

group 2 1.805 0.2332 9.0 1.277 2.33

group 3 12.088 2.6014 9.0 6.203 17.97

Degrees-of-freedom method: appx-satterthwaite

Confidence level used: 0.95

Created on 2024-01-27 with reprex v2.0.2https://reprex.tidyverse.org/

Thanks for reporting this. I think the Satterthwaite code is now a lot more robust. BTW, Frank Satterthwaite was an Iowa (where I am) graduate in 1943 and his d.f. method is about his only contribution.

I'm gonna close this issue as I think we've tied it up. But re-open it if you see any gaps.

— Reply to this email directly, view it on GitHubhttps://github.com/rvlenth/emmeans/issues/465#issuecomment-1913235269, or unsubscribehttps://github.com/notifications/unsubscribe-auth/BFUIYKLI7RF5VKBFRSSUVLTYQUSIRAVCNFSM6AAAAABCMW6AFKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSMJTGIZTKMRWHE. You are receiving this because you authored the thread.Message ID: @.***>