ourcodingclub / ourcodingclub.github.io

The main repository for the Coding Club website
https://ourcodingclub.github.io
Other
77 stars 83 forks source link

Data vis 2 tutorial: mixed effect models #155

Open beverlytan opened 5 years ago

beverlytan commented 5 years ago

Hi @gndaskalova

Just thought I'd start a new issue thread here to ask for your input on the improvement for the second data vis tutorial. As per our previous discussion, I've left Hadyn's more complicated code to visualize the mixed effect models, but have just played around with using ggpredict from ggeffects to do the same thing, which I thought would be useful to present to anyone looking at the tutorial.

However, my biggest issue was that with Hadyn's original mixed effect model built with the function lme instead of lmer, I for some reason am unable to use the ggpredict function and this error comes up: Error in filter_impl(.data, quo) : Result must have length 6, not 42.

# Original mixed effect model
lm_heights <- lme(Height ~ year, random = ~1|year/plot, 
                  data = heights[heights$land == "Hogsmeade",])

# ggpredict does not work for this model
ggpred <- ggpredict(lm_heights, terms = c("year"))

I spent some time trying to see why that would be the case but couldnt figure it out so decided to try the same using lmer, and I was then able to use the ggpredict function and come up with the graph in that way. The code as follows:

# Utilizing lmer to create the mixed effect model instead
with_lmer_hogs <- lmer(Height ~ year + (1|plot) + (1|year), 
                  data = heights[heights$land == "Hogsmeade",])

ggpred_hogs <- ggpredict(with_lmer_hogs, terms = c("year"))

with_lmer_narn <- lmer(Height ~ year + (1|plot) + (1|year), 
                       data = heights[heights$land == "Narnia",])

ggpred_narn <- ggpredict(with_lmer_narn, terms = c("year"))

# Creating the graph
heights$plot <- as.factor(heights$plot)

(new_graph <- ggplot() +  
    geom_point(data = heights, 
               aes(x = year, y = Height, colour = land)) +
    geom_line(data = ggpred_hogs, aes(x = x, y = predicted), 
              colour = "#ff9999", size = 1) +
    geom_ribbon(data = ggpred_hogs, 
                aes(ymin = conf.low, ymax = conf.high, x = x), 
                fill = "rosybrown1", alpha = 0.4) + 
    geom_line(data = ggpred_narn, aes(x = x, y = predicted), 
              colour = "#99b3ff", size = 1) +
    geom_ribbon(data = ggpred_narn, 
                aes(ymin = conf.low, ymax = conf.high, x = x), 
                fill = "#deebf7", alpha = 0.4) + 
    scale_colour_manual(values = c("#ff9999", "#99b3ff"),
                        breaks = c("Hogsmeade","Narnia"),
                        name = "Land of magic",
                        labels = c("Hogsmeade","Narnia")) + 
    labs(y = "Mean canopy height", 
         x = "Year") + 
    theme_bw() + 
    theme(panel.grid = element_blank()))

This piece of code produces a similar graph to what Haydn's code produces:

image

Therefore, I just wanted to ask if you would like me to add in this piece of code to show the use of ggpredict or if I should just leave it, since an entirely different function to develop the mixed effect model is being utilized here. Also, I'm actually unsure if i've stated the random effects correctly (Haydn's original random = ~1|year/plot vs my lmer using (1|plot) + (1|year) instead).

Let me know what you think is the most appropriate and I'd be happy to make the changes.

gndaskalova commented 5 years ago

Hey @beverlytan !

There is a slight difference between random = ~1|year/plot vs lmer using (1|plot) + (1|year) which may be the source of the error.

~1|year/plot has plot nested within year, whereas when you do (1|plot) + (1|year) that has plot and year independently as random effects.

When you ask for ggpred <- ggpredict(lm_heights, terms = c("year")) I think the error comes from the fact that ggpredict expects six values, but instead the model is designed to give 42 values (7 years of data from 6 plots, 7*6 = 42).

Have you tried ggpred <- ggpredict(lm_heights, terms = c("year", "plot")) ? And in general when you have just one term, I don't think you need the c() you can just do terms = "year".

And if it has to be the lmer syntax, you can try (1|tear/plot) to make it the same as Haydn's.

So once those things are resolved, I think you should add the ggpredict() parts, that would be valuable, and thanks to you, I've also learned more about it!

When you do that, could you also split this code chunk in two with some brief text in between, like "and now we will repeat the same actions for Narnia" or whatever, just so that the content is not all at once. Ugh, the image won't load, it's the code chunk that has the matrix multiplication stuff.

Last question - this tutorial is scheduled to be delivered as a workshop on the 30th January - since you've spent so much time working on it, do you want to be a tutor for that workshop? No worries if not, just thought I should ask.

beverlytan commented 5 years ago

Hi @gndaskalova , sorry for the late reply.

Unfortunately, all of the following gives the problem of ggpredict expecting 6 values, while the model gives 42 instead.

ggpred <- ggpredict(lm_heights, terms = "year")
ggpred <- ggpredict(lm_heights, terms = c("year", "plot"))
ggpred <- ggpredict(lm_heights, terms = c("plot", "year"))

Therefore, I tried out the lmer syntax, but I still can't make it the same as Haydn's. Doing the following all tells me that "Error: number of levels of each grouping factor must be < number of observations".

test1 <- lmer(Height ~ year + (1|plot/year), 
              data = heights[heights$land == "Hogsmeade",])

test2 <- lmer(Height ~ year + (1|year/plot), 
              data = heights[heights$land == "Hogsmeade",])

test3 <- lmer(Height ~ year + (1|year:plot), 
              data = heights[heights$land == "Hogsmeade",])

test4 <- lmer(Height ~ year + (1|plot:year), 
              data = heights[heights$land == "Hogsmeade",])

So I don't really know what to do hahaha. Actually I'm confused as to why even the graph produced by (1|plot) + (1|year) would look similar to what Hadyn's ~1|year produced, even though like you said that there is a slight difference?

Or maybe the difference in whether theyre nested / independent is not relevant in this case?

gndaskalova commented 5 years ago

Hey @sandra-ab

I was wondering if perhaps you might be able to help with this one?

sandra-ab commented 5 years ago

Hiya! Pitching in my two cents, hope I understand the issue(s) right.

First, as for the model specification:

I don’t think it’s right to nest year within plot (Haydn might disagree, but I always use year as a crossed, not a nested random effect). I would probably update the tutorial to change that. The logic behind that is that all the samples/plots/regions experience more similar conditions in one given year compared to the next, if that makes sense.

The reason you get the error message about the number of observations is that you have only one observation per plot per year, so you can’t calculate variance on that. Therefore, I would specify the syntax as Beverly initially did:

modelnarn <- lmer(Height ~ year + (1|plot) + (1|xyear), 
              data = heights[heights$land == "Narnia",])

(I get a singular fit for Hogsmeade, so I switched to Narnia for the example.) Also note that I created a new variable xyear that has year as a factor, not an integer, as conceptually it makes more sense for the random effect.

Second, when it comes to plotting the effects:

I think one problem here is that the ggpredict() function is designed to return a graph of the MAIN effects, not the random effects. When you're telling it to use terms = c("year", "plot"), you are asking to estimate random, not fixed effects. Since you have only 6 levels of year (and year is your only fixed effect in the model), the function expects 6 values. I suspect it would behave differently if you had multiple fixed effects or interactions - might give it a try later.

To plot fixed effect estimates accounting for the random effect structure, you need to specify type = “re” (for random effects) in the function. See this link for more info: https://cran.r-project.org/web/packages/ggeffects/vignettes/randomeffects.html

Sadly, because Hogsmeade has a singular fit, it won’t give confidence intervals and I can’t replicate the graph exactly. However, for Narnia, if you compare the graph that has the “fe” vs “re” structure, you see that the error band gets wider when you account for random effects.

# create predictions, by default only the fixed effect part
ggpred_hogs <- ggpredict(modelhog, terms = "year")
ggpred_narn <- ggpredict(modelnarn, terms = "year")

# create predictions, accounting for random effect structure
ggpred_hogs_re <- ggpredict(modelhog, terms = "year", type = "re")
ggpred_narn_re <- ggpredict(modelnarn, terms = "year", type = "re")

# Creating the graphs; error band should be larger in the second one
heights$plot <- as.factor(heights$plot)

narnia_fe <- ggplot() +  
      geom_point(data = filter(heights, land == "Narnia"), 
                 aes(x = year, y = Height), colour = "gray") +
      geom_line(data = ggpred_narn, aes(x = x, y = predicted), 
                colour = "#99b3ff", size = 1) +
      geom_ribbon(data = ggpred_narn, 
                  aes(ymin = conf.low, ymax = conf.high, x = x), 
                  fill = "#deebf7", alpha = 0.4) +
      labs(y = "Mean canopy height", 
           x = "Year") + 
      theme_bw() + 
      theme(panel.grid = element_blank())

narnia_re <- ggplot() +  
      geom_point(data = filter(heights, land == "Narnia"), 
                 aes(x = year, y = Height), colour = "gray") +
      geom_line(data = ggpred_narn_re, aes(x = x, y = predicted), 
                colour = "#99b3ff", size = 1) +
      geom_ribbon(data = ggpred_narn_re, 
                  aes(ymin = conf.low, ymax = conf.high, x = x), 
                  fill = "#deebf7", alpha = 0.4) +
      labs(y = "Mean canopy height", 
           x = "Year") + 
      theme_bw() + 
      theme(panel.grid = element_blank())

library(egg)
ggarrange(narnia_fe, narnia_re, ncol = 2)

fe-vs-re

From the link above: “More technically speaking, type = "re" accounts for the uncertainty of the fixed effects conditional on the estimates of the random-effect variances and conditional modes”.

So basically, ggpredict() will give you the model estimates of the FIXED effects considering the random effects. If you’re interested in how levels of a random effect behave, you can use the “condition” argument to get estimates of a specific one (e.g. one of your populations/species/plot). Here I get it for plot 2 of Narnia. (If you try with plot 1, you will see that the intercept estimate changes, but the slope stays the same, because our model only assumes random intercepts.)

plot2 <- ggpredict(modelnarn, terms = "year", type = "re", condition = c(plot = 2))
plot(plot2)

Hope that helps? I think it would be really useful to implement the ggpredict function in the tutorial – I hadn’t used it before but it’s great! Much more customisable than the sjPlot package I was using previously. Practically speaking, I would:

1- update Haydn’s code to lme4 rather than nlme 2- use plot and xyear as separate, not nested, random effects 3- use the type = “re” argument in the ggpredict function

Anything else I can do just let me know!

gndaskalova commented 5 years ago

Hey @sandra-ab

That is awesome, thank you very much!

All that sounds good to me, so if you want to go ahead and implement those tutorial edits, that'd be awesome! And I think it would be really useful to have some the text from your message above in the tutorial as it explains what the different syntaxes do very well!

Having the lme4 code instead of the nlme would be great as well, as it would definitely make things more recent!

We can have the graphs be just for Narnia, but might be useful to explain why they're not working for Hogsmeade (I imagine there isn't enough variance in those data? Or something like that?).

Plotting the results of a mixed effect model is pretty much our most asked question, so the updates will be very useful to people I think!

@beverlytan once Sandra has updated the tutorial, could you please tweet about it?

Thanks very much team!

beverlytan commented 5 years ago

Hi @sandra-ab and @gndaskalova,

Thanks Sandra for going through the tutorial / trying to solve the questions I had, but yes please do just feel free to edit the ideas that I had previously developed, I honestly can't remember much of what I did at the point so would be best for you to just edit and update the tutorial!

And yes, will tweet about it when it's been edited :-)

gndaskalova commented 4 years ago

Hey Sandra @sandra-ab

Just going through issues and quickly wanted to check if the mixed eff tutorial got updated with your work from above? Bad internet here so the tutorial doesn't load for me to check.

Thanks!

sandra-ab commented 4 years ago

@gndaskalova not yet - my bad! There wasn't really room for it after updating the advanced data vis tutorial, so I'll create a new example in the mixed model tutorial instead. (Currently I think that tutorial only suggests tables to report mixed model results!) I'll put that tut on my priority list to update as it's always a popular one. Thanks for the reminder!