`regrid()` does not handle nested structures correctly

rvlenth commented 3 years ago

This issue is reported in a question on StackOverflow.

A test example:

cows <- data.frame (
  route = factor(rep(c("injection", "oral"), c(5, 9))),
  drug = factor(rep(c("Bovineumab", "Charloisazepam", 
                      "Angustatin", "Herefordmycin", "Mollycoddle"), c(3,2,  4,2,3))),
  resp = c(34, 35, 34,   44, 43,      36, 33, 36, 32,   26, 25,   25, 24, 24)
)

cows.lm <- lm(resp ~ route + drug, data = cows)
cows.rg <- ref_grid(cows.lm)
cows.lrg <- regrid(cows.rg, transform="log")

Then summary(cows.lrg) fails because its @grid slot has 10 rows but its @linfct slot has only 5. It is regrid() that tossed them out; that is, nested structures require extra rows to keep an overall balanced, complete grid and a display attribute to keep track of which to use.

rvlenth commented 3 years ago

Additional specifics...

Some slot contents in cows.lrg:

> cows.lrg@bhat
[1] 3.533687 3.536117 3.772761 3.238678 3.191847
> cows.lrg@linfct
     [,1] [,2] [,3] [,4] [,5]
[1,]    1    0    0    0    0
[2,]    0    1    0    0    0
[3,]    0    0    1    0    0
[4,]    0    0    0    1    0
[5,]    0    0    0    0    1
> cows.lrg@V
     [,1] [,2] [,3] [,4] [,5]
[1,]   NA   NA   NA   NA   NA
[2,]   NA   NA   NA   NA   NA
[3,]   NA   NA   NA   NA   NA
[4,]   NA   NA   NA   NA   NA
[5,]   NA   NA   NA   NA   NA
> cows.lrg@misc$display
 [1] FALSE  TRUE  TRUE FALSE  TRUE FALSE FALSE  TRUE FALSE  TRUE

Interestingly, we have somewhat different issues if we regrid without a transformation:

> cows.rrg = regrid(cows.rg, "response")

> cows.rrg@bhat
[1] 34.25000 34.33333 43.50000 25.50000 24.33333
> cows.rrg@linfct
     [,1] [,2] [,3] [,4] [,5]
[1,]    1    0    0    0    0
[2,]    0    1    0    0    0
[3,]    0    0    1    0    0
[4,]    0    0    0    1    0
[5,]    0    0    0    0    1
> cows.rrg@V
    1             2             3  4             5  6  7             8  9            10
1  NA            NA            NA NA            NA NA NA            NA NA            NA
2  NA  4.189815e-01  0.000000e+00 NA  0.000000e+00 NA NA  4.624702e-17 NA -1.900323e-16
3  NA  0.000000e+00  5.586420e-01 NA -2.220446e-16 NA NA -1.101146e-16 NA -1.393884e-16
4  NA            NA            NA NA            NA NA NA            NA NA            NA
5  NA  0.000000e+00 -2.220446e-16 NA  8.379630e-01 NA NA -9.264134e-18 NA -2.349881e-17
6  NA            NA            NA NA            NA NA NA            NA NA            NA
7  NA            NA            NA NA            NA NA NA            NA NA            NA
8  NA  5.551115e-17 -1.101146e-16 NA -9.264134e-18 NA NA  8.379630e-01 NA -2.455434e-16
9  NA            NA            NA NA            NA NA NA            NA NA            NA
10 NA -1.665335e-16 -1.393884e-16 NA -2.349881e-17 NA NA -2.313087e-16 NA  5.586420e-01
> cows.rrg@misc$display
 [1] FALSE  TRUE  TRUE FALSE  TRUE FALSE FALSE  TRUE FALSE  TRUE

rvlenth commented 3 years ago

OK, I think we have it fixed:

> RRG = regrid(cows.rg)
> confint(RRG)
 route     drug           prediction    SE df lower.CL upper.CL
 oral      Angustatin           34.2 0.647  9     32.8     35.7
 injection Bovineumab           34.3 0.747  9     32.6     36.0
 injection Charloisazepam       43.5 0.915  9     41.4     45.6
 oral      Herefordmycin        25.5 0.915  9     23.4     27.6
 oral      Mollycoddle          24.3 0.747  9     22.6     26.0

Confidence level used: 0.95 
> 
> LRG = regrid(cows.rg, "log")
> confint(LRG, type = "r")
 route     drug           response    SE df lower.CL upper.CL
 oral      Angustatin         34.2 0.647  9     32.8     35.7
 injection Bovineumab         34.3 0.747  9     32.7     36.1
 injection Charloisazepam     43.5 0.915  9     41.5     45.6
 oral      Herefordmycin      25.5 0.915  9     23.5     27.7
 oral      Mollycoddle        24.3 0.747  9     22.7     26.1

Confidence level used: 0.95 
Intervals are back-transformed from the log scale

So then if we do what was intended, it works ...

> confint(LEMM <- emmeans(LRG, "route"), type = "response")
 route     response    SE df lower.CL upper.CL
 injection     38.6 0.585  9     37.3     40.0
 oral          27.7 0.470  9     26.7     28.8

Results are averaged over the levels of: drug 
Confidence level used: 0.95 
Intervals are back-transformed from the log scale 

> pairs(LEMM, type = "response")
 contrast         ratio     SE df null t.ratio p.value
 injection / oral   1.4 0.0317  9    1  14.648  <.0001

Results are averaged over the levels of: drug 
Tests are performed on the log scale

rvlenth commented 3 years ago

This leaves me with another issue that I may try to address: If we start with a reference grid in emmeans(), type is ignored:

> emmeans(LRG, pairwise ~ route, type = "response")
$emmeans
 route     emmean     SE df lower.CL upper.CL
 injection   3.65 0.0151  9     3.62     3.69
 oral        3.32 0.0170  9     3.28     3.36

Results are averaged over the levels of: drug 
Results are given on the log (not the response) scale. 
Confidence level used: 0.95 

$contrasts
 contrast         estimate     SE df t.ratio p.value
 injection - oral    0.333 0.0227  9  14.648  <.0001

Results are averaged over the levels of: drug 
Results are given on the log (not the response) scale.

However, if we start with a model, it is not:

> lcows.lm <- lm(log(resp) ~ route + drug, data = cows)
> emmeans(lcows.lm, pairwise ~ route, type = "response")
NOTE: A nesting structure was detected in the fitted model:
    drug %in% route
$emmeans
 route     response    SE df lower.CL upper.CL
 injection     38.6 0.687  9     37.1     40.2
 oral          27.7 0.374  9     26.8     28.5

Results are averaged over the levels of: drug 
Confidence level used: 0.95 
Intervals are back-transformed from the log scale 

$contrasts
 contrast         ratio     SE df null t.ratio p.value
 injection / oral   1.4 0.0312  9    1  14.931  <.0001

Results are averaged over the levels of: drug 
Tests are performed on the log scale

rvlenth commented 3 years ago

I think I have fixed this incidental bug:

> emmeans(LRG, "route", type = "response")
 route     response    SE df lower.CL upper.CL
 injection     38.6 0.585  9     37.3     40.0
 oral          27.7 0.470  9     26.7     28.8

Results are averaged over the levels of: drug 
Confidence level used: 0.95 
Intervals are back-transformed from the log scale 

> emmeans(LRG, ~route, type = "response")
 route     response    SE df lower.CL upper.CL
 injection     38.6 0.585  9     37.3     40.0
 oral          27.7 0.470  9     26.7     28.8

Results are averaged over the levels of: drug 
Confidence level used: 0.95 
Intervals are back-transformed from the log scale ]

> emmeans(LRG, pairwise~route, type = "response")
$emmeans
 route     response    SE df lower.CL upper.CL
 injection     38.6 0.585  9     37.3     40.0
 oral          27.7 0.470  9     26.7     28.8

Results are averaged over the levels of: drug 
Confidence level used: 0.95 
Intervals are back-transformed from the log scale 

$contrasts
 contrast         ratio     SE df null t.ratio p.value
 injection / oral   1.4 0.0317  9    1  14.648  <.0001

Results are averaged over the levels of: drug 
Tests are performed on the log scale

rvlenth commented 3 years ago

Closing tbhis, as bug fixes are in place.

rvlenth commented 2 years ago

Apparently, some changes I made in 2.8.2 for supporting averaging of counterfactuals sidelined the fix I made to this issue a year+ ago

received by email...

I hope you don’t mind me getting in touch, but I’m having an issue with the regrid() function from the emmeans() package and I was hoping you might be able to help. I stumbled on this github issue you’ve logged (#287) following this StackOverflow post which seems to describe a very similar issue to what I’m experiencing with my own data, but it appears from your GitHub thread that the issue was resolved and is now closed. I haven’t started using GitHub yet (on my immediate to do list) so thought I’d send you a message directly as I wasn’t sure if you would see any messages posted on a closed issue on GitHub. I’m wondering if the fix you describe as having put in place may no longer be working in the latest release (I’m using emmeans version 1.8.2, updated today).

If you run the code in the test example you posted in the GitHub issue log you will see the issue, regrid() throws the following error (which is the same issue I’m having with regrid() and my own data with nested mixed-effects models, but thought it simpler to use the code from the OP here):

Error in dimnames(x) <- dn : length of 'dimnames' [2] not equal to array extent

Have you come across this issue again recently? Any help you can provide would be greatly appreciated as I’m using the regird() function in some analysis I’m doing at the moment.

rvlenth commented 2 years ago

There was a logic error in a part of the code where the columns of the @linfct slot were re-labeled. I think I have it fixed (again). I still need to look back at exactly what was the code in version 1.8.1, before I changed regrid(), to make sure I am doing everything the way that was intended.

rvlenth commented 2 years ago

Looking at file histories, I believe this bug had nothing to do with the counterfactuals code. The error dates back to July 27 (before version 1.8.0) when I decided to add those column names.

I have tested the cows example with versions 1.7.5, 1.8.0, 1.8.2, and the patched version now on GitHub; and have confirmed that the errors occur with 1.8.0 and 1.8.2 (and presumably 1.8.1 which I didn't test) but not with 1.7.5 or the new update. So I am quite sure what caused the error (the added code to label the columns of linfct), and am confident that it has been fixed. So am closing this issue once again.

rvlenth commented 2 years ago

Oops -- you know what, I should leave this issue open until there is a new update on CRAN, so people are more likely to find this.

rvlenth commented 1 year ago

OK, the update is on CRAN, and I think all is working correctly now; so am closing

rvlenth / emmeans

`regrid()` does not handle nested structures correctly #287

Additional specifics...

received by email...