Closed MLieke closed 1 month ago
There is a lot to read here, so it may be a while before I answer any specific questions. Please note that this GitHub site is intended for software-specific questions. I hesitate to advise on substantive issues of what is appropriate to do. The same is true of Stack Overflow. So my impression is that these questions should be posted on CrossValidated, where you will find a large pool of experienced people -- many of them smarter than me.
One note: From what I see in glancing through, bias.adj = TRUE
is ignored. Bias adjustments only come into play if you back-transform from the link scale (in this case, the log).
I note you have one plot that is on the count scale. The next plot, I have no idea what scale it is no since there is no y-axis label, but I'm guessing it's differences of logs. Perhaps the different scales is an explanation for the discrepancies you see.
Dear Dr Lenth
Thank you for your super quick reply, that's very nice!
I see, I may try to post an updated version of this question on Cross Validated, although I am a bit afraid my remaining question (see below) would be considered to have too much to do with the internal workings of emmeans?
Thank you also for your notes which were very useful as they pointed me towards some silly mistakes I made when preparing this question. I indeed mixed things up with the bias adjustment, which isn't relevant/doing anything in the part where I want to obtain confidence intervals for the difference of logs (which is indeed the scale of the second plot). Additionally, I noticed I accidentally put the "trt.vs.ctrl" outside of the brackets for the contrast function when producing "confint_emmeans_biasadjusted" so that this in fact did not yield the confidence intervals for the difference in the estimated means. Apologies for this!
I do still wonder why the Wald confidence interval for the difference in the estimated means (at the log scale) obtained using confint(contrast(emmeans(model, ~landuse), "trt.vs.ctrl", ref = 1))
("confint_emmeans" on the second plot) is wider than the Wald confidence interval obtained using confint(model, method = "Wald")
, and wider than the confidence intervals obtained using bootstrapping, so if by any chance you would have an idea on this (I hope I don't bother you by asking this!) it would be very welcome!
In any case thank you very much! Lieke
I don't know why those other CIs are narrower. Keep in mind that you have a mixed model here and the emmeans-based intervals do take into account the variation of the random effects. All I can suggest is that you look very carefully at the documentation and any annotations for those other procedures and make sure those methods are indeed computing CIs for the same parameters. If you do
emm <- emmeans(model, ~ landuse)
contrast(emm, "trt.vs.ctrl1")
you should see the same estimates and SEs as you had for the landuse
effects in summary(model)
. So it's hard to argue with those results...
OK, thank you very much, I'll try to look into that!
(Assuming you're in the temperate zone:) Have a great autumn!
Best regards Lieke
If you have specific mixed model, questions, there is a (friendly) listserve that is often helpful.
Dear Dr. Lenth
I hope you are doing well and want to thank you for your work on the emmeans-package and on helping people out with its use, I think that's super valuable!
I was unsure if this question would be best posted here or on Stack Overflow or Cross Validated, because there's both theoretical considerations and R-coding involved, so I hope it is fine I post it here.
Context
In the context of an ecological study, we want to compare the densities of certain organism groups between four land use types. In fact, we are especially interested in comparing the densities within one land use type to the densities in the three other land use type. We collected organisms in 15 different locations, with each location containing at least three of the land use types, and took two or three samples per land use type per location. For this example, we hereby obtained in total 144 observations.
We have formulated a very simple model: density ~ landuse + (1|location). In the case of this example, we model the densities with a tweedie distribution due to the data consisting of non-integer positive values and quite some zeroes. (We preferred this over an approach with a zero-inflated Gamma model because of the ease of interpretation in our context.)
As we are interested in comparing one land use type to the three other types, I am currently using emmeans and contrasts. We report on these comparisons by showing a figure as shown below and by reporting the ratios of the emmeans and the 95% confidence intervals for those ratios, on the response scale. Our audience is rather broad and not per se acquainted with sophisticated statistical approaches (nor are we).
I tried replicating the issue with a fictitious dataframe that I could share if needed to replicate the issue, but did consider the output from my actual situation more meaningful and thought it will suffice for you to understand my question.
Here is some background information on the model and emmeans:
An illustration of the figures we use to report the results of these analyses, with the black dots and error bars representing the model estimated means and their Wald 95% confidence intervals:
Concerns
Since we have rather small sample sizes (n ranging from 88 to 311) and asymptotic confidence intervals are said to often be problematically narrow in the case of small sample sizes, I was wondering if this poses a problem here and looked into other ways of obtaining confidence intervals for the parameter estimates. I think it is clearest if I just refer to the code chunk below to show what I did:
For conciseness, I skip the code on how I visualised this, but this yielded the below:
Specific questions
My questions are the following: a) Am I making mistakes in my approach, generally and/or in terms of how I coded it? a) Why is the confidence interval "confint_emmeans" actually wider rather than more narrow than the bootstrap intervals if asymptotic intervals are generally said to be less conservative, especially for smaller sample sizes? b) Am I right to think that the confidence interval "confint_emmeans_biasadjusted" is more accurate than the other confidence intervals because it uses this bias adjustment to take into account the random effects? And would you consider it sufficiently usable for our analyses, or is there a better (hopefully not too complicated) way to approach this (e.g. some sort of bootstrap confidence interval, as generally recommended for GLMM, but with a kind of "bias adjustment" for the random effects?)?
Thank you so much in advance and have a nice day! Lieke