rvlenth / emmeans

Estimated marginal means
https://rvlenth.github.io/emmeans/
358 stars 31 forks source link

Issue with emmeans when lme() is with interaction #359

Closed Auxanehmn closed 2 years ago

Auxanehmn commented 2 years ago

Hi,

I have created a regression model with random effects and some interactions between factor :

model_3_niveaux <- lme(fixed=Lactose ~ 1 + Status4*(poly(DIM,degree=2,raw=TRUE)) + Status4*parite + Cl_diftime + MY + herd,     
                       data=dataL,                               
                       random = list(~1 | Cow_ATQ, ~1 | ID_Q),   #on spécifie qu'on a des mesures répétées par vache et par quartier
                       method="REML",                             
                       na.action = na.omit,
                       control=lmeControl(returnObject=TRUE))

I would like to calculate the average Lactose from my model_3_levels. So I used emmeans :

emmeans(model_3_niveaux, pairwise ~ Status4*parite)

But I have warning messages :

1: In sqrt(.qf.non0(object@V, x)) : production de NaN
2: In sqrt(.qf.non0(object@V, x)) : production de NaN
3: In sqrt(.qf.non0(object@V, x)) : production de NaN
4: In sqrt(.qf.non0(object@V, x)) : production de NaN

So I tried to remove my interactions in my model and it works normally without any warning message. I don't understand where this problem comes from as I used to use emmeans with this type of model.

Can you help me ?

Thank you. PS : sorry I don't know how to use Markdown on Github

rvlenth commented 2 years ago

My guess is that some parameters are not identifiable in the interaction model. It's not clear whether you saw any output at all, as you sometimes do when a warning is issued. So I would want to see more in order to understand what happened.

But what I do know is that those warnings occurred after the emmeans calculations were completed and while a summary of the results were displayed. You can tell more by breaking it into smaller steps:

result <- emmeans(model_3_niveaux, pairwise ~ Status4*parite)
result[[1]]    # the marginal means
result[[2]]    # the pairwise comparisons

Also, I think predict(result[[1]]) and predict(result[[2]]) will successfully show just the estimates, because the .qf.non0 calculations in the warnings come in when computing their standard errors. You somehow have a situation where some or all standard errors cannot be calculated.


PS -- on markdown -- No big issue -- But if you like, you can use the widgets at the top of the comment window to format what you write, and use the preview tab to see what it will look like. For reports/questions like this, the most important one is to mark computer code: just highlight the code and click the <> widget. I edited your question so you can see how it comes out -- just back-quote characters, or three of them before and after lines of code.

Auxanehmn commented 2 years ago

Hi, Thank you for your quick respond ! Yes I do have an output with my model :

Linear mixed-effects model fit by REML
  Data: dataL 
        AIC      BIC  logLik
  -31.10579 176.7209 43.5529

Random effects:
 Formula: ~1 | Cow_ATQ
        (Intercept)
StdDev:   0.1625541

 Formula: ~1 | ID_Q %in% Cow_ATQ
        (Intercept)  Residual
StdDev:   0.1058786 0.2176579

Fixed effects:  Lactose2 ~ 1 + Status4 * (poly(DIM, degree = 2, raw = TRUE)) +      Status4 * parite + Cl_diftime + MY + herd 
                                                Value  Std.Error    DF   t-value p-value
(Intercept)                                  4.032332 0.02532763 10879 159.20686  0.0000
Status42                                     0.001749 0.01544085 10879   0.11324  0.9098
Status43                                    -0.228705 0.01994924 10879 -11.46434  0.0000
Status44                                    -0.156784 0.01723216 10879  -9.09833  0.0000
poly(DIM, degree = 2, raw = TRUE)1          -0.000359 0.00015128 10879  -2.37209  0.0177
poly(DIM, degree = 2, raw = TRUE)2           0.000000 0.00000055 10879  -0.59486  0.5519
paritePrimipare                              0.242608 0.02405830   375  10.08417  0.0000
Cl_diftime2                                  0.011758 0.00606271 10879   1.93933  0.0525
Cl_diftime3                                  0.168101 0.00644333 10879  26.08917  0.0000
Cl_diftime4                                  0.308604 0.00641798 10879  48.08427  0.0000
Cl_diftime5                                  0.382393 0.00996538 10879  38.37217  0.0000
MY                                           0.028428 0.00105061 10879  27.05850  0.0000
herd2                                       -0.054003 0.03192480   375  -1.69157  0.0916
herd3                                       -0.045007 0.02937959   375  -1.53193  0.1264
herd4                                       -0.078153 0.03259678   375  -2.39756  0.0170
herd5                                        0.065896 0.02563766   375   2.57026  0.0105
Status42:poly(DIM, degree = 2, raw = TRUE)1 -0.000215 0.00027213 10879  -0.78841  0.4305
Status43:poly(DIM, degree = 2, raw = TRUE)1  0.000470 0.00032412 10879   1.44986  0.1471
Status44:poly(DIM, degree = 2, raw = TRUE)1 -0.001272 0.00027623 10879  -4.60440  0.0000
Status42:poly(DIM, degree = 2, raw = TRUE)2  0.000001 0.00000101 10879   0.94629  0.3440
Status43:poly(DIM, degree = 2, raw = TRUE)2 -0.000002 0.00000115 10879  -2.15230  0.0314
Status44:poly(DIM, degree = 2, raw = TRUE)2  0.000003 0.00000098 10879   3.34664  0.0008
Status42:paritePrimipare                     0.013657 0.01329463 10879   1.02722  0.3043
Status43:paritePrimipare                     0.035705 0.01821441 10879   1.96025  0.0500
Status44:paritePrimipare                     0.030755 0.01988051 10879   1.54700  0.1219

The warning messages are only when I use emmeans. I tried predit(result[[1]]) as you said, the estimates appears succefully.
I really don't understand why it doesn't work because I used this method a month ago and everything was corking well...

Tell me if you need my data or my script to understand the problem.

Thank you.

Auxanehmn commented 2 years ago

I forgot to show you the result of emmeans.

 emmeans(model_3_niveaux, pairwise ~ Status4*parite)
$emmeans
 Status4 parite    emmean     SE  df lower.CL upper.CL
 1       Multipare   4.42 0.0117 375     4.39     4.44
 2       Multipare   4.41    NaN 375      NaN      NaN
 3       Multipare   4.21    NaN 375      NaN      NaN
 4       Multipare   4.15 0.3990 375     3.37     4.94
 1       Primipare   4.66 0.0218 375     4.62     4.70
 2       Primipare   4.66    NaN 375      NaN      NaN
 3       Primipare   4.49    NaN 375      NaN      NaN
 4       Primipare   4.43 0.4776 375     3.49     5.37

Results are averaged over the levels of: Cl_diftime, herd 
Degrees-of-freedom method: containment 
Confidence level used: 0.95 
 >predict(N_compared_statut[[1]])
[1] 4.416530 4.406342 4.208513 4.154589 4.659138 4.662606 4.486826 4.427952
> predict(N_compared_statut[[2]])
 [1]  0.010188408  0.208017058  0.261941431 -0.242608017 -0.246076163 -0.070295817 -0.011421831  0.197828649
 [9]  0.251753022 -0.252796426 -0.256264572 -0.080484226 -0.021610240  0.053924373 -0.450625075 -0.454093221
[17] -0.278312875 -0.219438889 -0.504549448 -0.508017594 -0.332237248 -0.273363262 -0.003468146  0.172312200
[25]  0.231186186  0.175780346  0.234654332  0.058873986`
rvlenth commented 2 years ago

Looking at this, I think I commented further on this several days ago, but it doesn't show here. I wonder if I never clicked on "Comment".

I'm pretty sure I suggested re-fitting the model using poly(DIM,degree=2) (without raw = TRUE). I believe that raw polynomials can create a terrible numerical conditioning situation, and that could lead to the problems you have experienced. I'm going to leave it at that so I don't forget to click the "Comment" button.

Auxanehmn commented 2 years ago

Hello,

Sorry I was absent last week. Thank you for your answer. I tried to remove raw=TRUE in my modele but it doesn't change anything, I still have the warning message with emmeans : Warning messages: 1: In sqrt(.qf.non0(object@V, x)) : production de NaN 2: In sqrt(.qf.non0(object@V, x)) : production de NaN 3: In sqrt(.qf.non0(object@V, x)) : production de NaN 4: In sqrt(.qf.non0(object@V, x)) : production de NaN 5: In sqrt(.qf.non0(object@V, x)) : production de NaN 6: In sqrt(.qf.non0(object@V, x)) : production de NaN 7: In sqrt(.qf.non0(object@V, x)) : production de NaN 8: In sqrt(.qf.non0(object@V, x)) : production de NaN 9: In sqrt(.qf.non0(object@V, x)) : production de NaN

rvlenth commented 2 years ago

Well, these warningsare coming up in calculating the SE of the estimates. The warning message itself is coming from the sqrt() function; for example:

> sqrt(-1)
[1] NaN
Warning message:
In sqrt(-1) : NaNs produced

The function .qf.non0 is a small internal function that computes a quadratic form like a'Va where V is the covariance matrix of the regression coefficients (from vcov(model)), and a is the linear combination of those coefficients that we want to compute. The value of a'Va (which is the variance of the sampling distribution of the estimate) should always be nonnegative, but apparently it is coming out negative. My guess is that a'Va is only very slightly negative, and that can happen with small floating-point errors when V is essentially singular -- i.e., that some variances are actually supposed to be zero for certain instances of a, and we are getting negative results just due to noise in the computation.

I had thought that raw polynomials were creating very bad numerical conditioning. But without raw = TRUE in there, the conditioning should be much better, and now you are getting 9 of these warnings instead of only 4. So the bottom line is that I think your model is over-determined. I think you need to simplify the model somehow -- by removing some interactions, lowering the degree of the polynomials, specifying fewer random effects, or something.

Auxanehmn commented 2 years ago

Ok thank you ! I will try to simplify my model then.

rvlenth commented 2 years ago

If you still have the same model, could you try this (with emmeans already loaded):

assignInNamespace(".qf.non0", function(X, y) sum(y * (X %*% y)), "emmeans")

I'm just realizing that it's possible there is a pathological reason for this in my own code...

rvlenth commented 2 years ago

Oops - got the syntax wrong. Copy and paste the assignInNamespace line from the latest edited version on the issues page. (And then re-try your emmeans call)

Auxanehmn commented 2 years ago

I tried like you said with assignInNamespace but it still doesn't work ... `

assignInNamespace(".qf.non0", function(X, y) sum(y (X %% y)), "emmeans") emmeans(model_3_niveaux, pairwise ~ Status4*parite) #compare les moyennes entre les statut par parité $emmeans Status4 parite emmean SE df lower.CL upper.CL 1 Multipare 4.42 0.0117 375 4.39 4.44 2 Multipare 4.41 NaN 375 NaN NaN 3 Multipare 4.21 NaN 375 NaN NaN 4 Multipare 4.15 0.3990 375 3.37 4.94 1 Primipare 4.66 0.0218 375 4.62 4.70 2 Primipare 4.66 NaN 375 NaN NaN 3 Primipare 4.49 NaN 375 NaN NaN 4 Primipare 4.43 0.4776 375 3.49 5.37 ` Other thing, I tried my script on another computer and there is any warning messages. I don't understand, I'm working on a servor with other peoples and perhaps someone made an update on R which lead to this problems. Especially since everything was working fine a month ago.... I have to ask my team about it.

rvlenth commented 2 years ago

OK, well this shows we don't have that pathological problem with my code. Maybe the update changed something in the lme package (which I doubt, as they basically do not maintain that package), or perhaps the data changed? But with the data you have on your PC, I'm pretty well convinced the model is over-fitted.

rvlenth commented 2 years ago

If you are interested, the following should replace your warnings with more informative ones

.qf.non0 = function(X, y) {
    ii = (zapsmall(y) != 0)
    rtn = if (any(ii))
        sum(y[ii] * (X[ii, ii, drop = FALSE] %*% y[ii]))
    else 0
    if (rtn < 0) {
        warning("Negative variance estimate obtained.")
        rtn = NaN
    }
    rtn
}

assignInNamespace(".qf.non0", .qf.non0, "emmeans")
rvlenth commented 2 years ago

By the way, you will not see these warnings until the summary is displayed:

EMMs <- emmeans(model_3_niveaux, pairwise ~ Status4parite) #compare les moyennes entre les statut par parité
    # (no warnings yet)

EMMs
    # (now we get the messages)
Auxanehmn commented 2 years ago

Yes in fact I used the same data on my computer to test my model. You are right about the negative variance, see the warning messages obtained with your code : Warning messages: 1: In .qf.non0(object@V, x) : Negative variance estimate obtained. 2: In .qf.non0(object@V, x) : Negative variance estimate obtained. 3: In .qf.non0(object@V, x) : Negative variance estimate obtained. 4: In .qf.non0(object@V, x) : Negative variance estimate obtained.

I need to compare the package of the version on my PC and those on the server.

rvlenth commented 2 years ago

I think this is resolved, so am closing.