jknowles / merTools

Convenience functions for working with merMod objects from lme4
105 stars 15 forks source link

marginalize over RE #73

Closed bachlaw closed 4 years ago

bachlaw commented 7 years ago

Jared,

Great work as usual. "Given your response to the other issue, I take it that offering the re.form function from predict.merMod just isn't workable at this time, to marginalize over random effects?

thanks, Jonathan

jknowles commented 7 years ago

Currently I don't think that is possible. However, I have been thinking about rewriting the REimpact function, which does this in a different but related way, to allow this.

Can you describe your use case in a bit more detail. Are you looking to get a prediction interval that averages across all the levels of the random effects but at a fixed set of values for the non-random effects? Is it most common that you would want to do this to an entire data frame at once and return (lower, fit, upper) for each row?

This might be a relatively easy fix and would solve a nagging problem I have about why REimpact does not seem to be as useful as I imagined.

bachlaw commented 7 years ago

Sure Jared.

A baseball example: say I want to know the likely contributions of a batter, pitcher, and catcher to a pitch getting called a strike. I set up 3 random effects / groups, one for all players at each position, for all players in each category and run a glmer. I then want to predict outcomes without various categories so I can get a sense of how an average performer in that category would affect the result, and by extension, tell me the effect of the people in those categories. So it if were batters I wanted to leave out, it would be predict merMod, re.form=~(1|pitcher) + (1|catcher). It amounts to a glorified coefficient extraction, but particularly when a transformation needs to be applied, it is very handy.

You could do the same thing in an educational setting with groups of, say, (1|teacher), (1|school), and (1|district). Let's say you wanted to make predictions using school and district only, to see what effect an average teacher would have versus the normal predictions with the teachers present. You could then easily group the results with and without the teachers to get a mean effect on the outcome of various teachers to see if teacher variation is driving quality of outcomes. (I think Gelman uses an example like this in his multilevel modeling book).

Does this make sense?

Thanks, Jonathan

jknowles commented 7 years ago

Thanks @bachlaw

That does make sense.

Can you tell me what you would expect it to look like in the case where you had a slope and an intercept at the random level: e.g. (1 + class_size | teacher) + (1|district) + (1 + mean_test |school). Would you want the ability to ignore the slope term (class_size)? What about the slopes for school when you are averaging over teachers?

After that, I think I can probably work something out and put this together relatively quickly...

jknowles commented 7 years ago

@carlbfrederick Do you have an idea of how to solve this? I think I have seen about a dozen questions along these lines in the past few weeks, but I'm not sure how this is different from the output we provide for REsim.

I am going to code up an example in the branch margins and see what I can see...

carlbfrederick commented 7 years ago

@jknowles Hmmm...

Just trying to process here. Conceptually, the problem is trivial, no? The average teacher has a value of 0 for all random coefficients by definition. So, I think it is similar to REimpact (as much as I can remember it). That is how I would initially answer the question in the second paragraph of @bachlaw's most recent comment above. (e.g. compare moving from average teacher (or results "marginalized over" the teacher random effects) to a teacher at x percent of the distribution.

As a coding/usability issue, I am not sure if the question is a new feature or just better or more sophisticated packaging/interaction with REimpact I will think about this over the next few days and check out the margins branch to see if any thoughts come to mind.

Put some links or comments with the other similar questions in some document in the branch too. This might help me see some of the commonality.

jknowles commented 7 years ago

This came in an e-mail:

OK, let me describe my situation with a more concrete (but fake) example. Suppose that I have a dataset of SAT scores sampled from 24 high schools in each of the 6 states in the mid-atlantic region. I would like to show two things: 1) the average SAT score for each of the 6 states while controlling the average family income at the overall average income, and 2) the effect of family income for each state.

library('lme4')
mydata <- Penicillin
set.seed(10)
mydata$income <- rnorm(144)
mydata$school <- Penicillin$plate
mydata$state <- Penicillin$sample
mydata$SAT <- Penicillin$diameter

I can analyze the data “mydata” for each state separately. For example, for the first state:

m1 <- lm(SAT ~ income, data=mydata[mydata$state==‘A’,])

and the following results

summary(m1)$coefficients

               Estimate Std. Error     t value     Pr(>|t|)
(Intercept) 25.15231111  0.2171386 115.8352824 3.805796e-32
income      -0.03788689  0.2176436  -0.1740776 8.633964e-01

would allow me to achieve the two goals 1) and 2) above, and plot the results similar to Figure 6 in the Gelman et al. paper.

However, we need to perform multiple comparison correction since the above linear regression analysis is done 6 times (for the 6 states). The Gelman et al. paper argues that multiple comparison correction can be better handled through one (instead of 6) analysis with a multilevel model:

fm <- lmer(SAT ~ income + (income|school) + (1|state), data=mydata)

and then we can somehow obtain the two effects and their standard errors in 1) and 2) for each state as shown in Figure 7 of Gelman et al. It’s not clear to me how they exactly pulled the posterior data for their Figure 7, but the algebra is shown on pages 196-197.

jknowles commented 7 years ago

Also this fumbling attempt on SO for me to answer a similar question (I think similar)?

http://stackoverflow.com/questions/35947309/predict-with-new-random-effects/42906945#42906945

jknowles commented 7 years ago

@carlbfrederick You know REimpact does seem to have the kernel of the truth we are looking for.

carlbfrederick commented 7 years ago

I think I see the issue now... people seem to be asking for something along the line of a aggregated predictInterval function? E.g. what is the average prediction on the scale of the outcome variable for some value of fixed covariates for each school/state/etc. This would allow one to look purely at the differences between schools for some set of schools.

Does that capture what you think the issue is?

If so the issues to address include:

What do you think?

jknowles commented 7 years ago

Yes. I think that is correct. What confuses me is what values are they looking for in the effects they are not interested in? Do they truly want to sample across them, or just set them to zero?

A few people have referenced the lme4 method of allowing re.form=~0, etc. in the predict function...

I think we could get at something like that as the interface, and underneath do what you describe.

bachlaw commented 7 years ago

The effect of re.form is to set all REs excluded from the statement to 0 (average) to get a "with and without" effect. You might say this sounds a lot like extracting the intercept and you would be correct. But in cases with transformed distributions (e.g., logistic regression) it makes isolating the net effect much more handy.

Here, I think the value is to see, setting one or more REs to zero, how the predictions would be different. It is common then to take the average of the differences as a measure of each level of the RE. The ability to generate intervals around these altered predictions would be great, if possible.

Jonathan

Sent from my iPhone

On Mar 22, 2017, at 2:39 PM, Jared Knowles notifications@github.com<mailto:notifications@github.com> wrote:

Yes. I think that is correct. What confuses me is what values are they looking for in the effects they are not interested in? Do they truly want to sample across them, or just set them to zero?

A few people have referenced the lme4 method of allowing re.form=~0, etc. in the predict function...

I think we could get at something like that as the interface, and underneath do what you describe.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/jknowles/merTools/issues/73#issuecomment-288498767, or mute the threadhttps://github.com/notifications/unsubscribe-auth/ALDpUDcGV3_f9s0-foo53yU9ENEqK8y5ks5roWr2gaJpZM4MKgd6.

jknowles commented 7 years ago

Bingo @bachlaw -- I think I get it now. Not sure what my block was, but this is something we can implement for sure.

@carlbfrederick -- I would have time to work on this later this week, but happy to let you take a crack at it first if you have availability/interest, otherwise I can ping you when I have something worth reviewing

carlbfrederick commented 7 years ago

@jknowles I have some free time too, but don't wait on me. I will work on my own branch and let you know if I stumble onto something promising.

bachlaw commented 7 years ago

Thanks to all and I apologize for not doing a better job explaining it sooner!

Sent from my iPhone

On Mar 22, 2017, at 3:23 PM, Carl Frederick notifications@github.com<mailto:notifications@github.com> wrote:

@jknowleshttps://github.com/jknowles I have some free time too, but don't wait on me. I will work on my own branch and let you know if I stumble onto something promising.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/jknowles/merTools/issues/73#issuecomment-288511105, or mute the threadhttps://github.com/notifications/unsubscribe-auth/ALDpUOs7GzdF-7Wa1r7RxLZZIKu4fd4Oks5roXUygaJpZM4MKgd6.

carlbfrederick commented 7 years ago

@jknowles I have yet to get to this (work has a way of piling up and I am foolishly trying to add some automatic reporting to DEWS). I just wanted to check in and see if you were making any progress.

jknowles commented 7 years ago

No, unfortunately @carlbfrederick I got busy right when I had started to sort this out. Looking like the sky is going to clear up here in mid-May -- if you haven't tackled it by then, I will certainly be able to dive in.

carlbfrederick commented 7 years ago

Cool. I'll put it back on my todo list then.

On Apr 10, 2017, at 2:09 PM, Jared Knowles notifications@github.com wrote:

No, unfortunately @carlbfrederick I got busy right when I had started to sort this out. Looking like the sky is going to clear up here in mid-May -- if you haven't tackled it by then, I will certainly be able to dive in.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.

jknowles commented 7 years ago

Hey @carlbfrederick did you ever tackle this? I have a bunch of imputation related changes I am pushing to the project, and tidying up some other issues. I'd like to shoot for a new release in time for the fall semester (when many HLM students will discover our package fresh) :-)

What do you think?

carlbfrederick commented 7 years ago

I haven't. I am still deep in trying to work out the DEWS missing data stuff. Let me look at my calendar tomorrow and see if I can block some time on your schedule.

On Jul 11, 2017, at 7:25 PM, Jared Knowles notifications@github.com wrote:

Hey @carlbfrederick did you ever tackle this? I have a bunch of imputation related changes I am pushing to the project, and tidying up some other issues. I'd like to shoot for a new release in time for the fall semester (when many HLM students will discover our package fresh) :-)

What do you think?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or mute the thread.

carlbfrederick commented 7 years ago

@jknowles I have some time coming up in the next few weeks. I don't know how your schedule will work out, but if you can wait until after then to get started, I will let you know after that first week in August to see if I have been able to work it in. What do you think?

jknowles commented 7 years ago

That sounds good to me @carlbfrederick -- I am moving and finishing up some other stuff the end of July, so pushing out in late August / early September would be ideal for release time.

carlbfrederick commented 7 years ago

@jknowles I spent some time today digging back into predictInterval() and looking at how predict.merMod(..., re.form = ~0) works. We already do something similar with the which = "all" argument, no? That provides a prediction for each grouping variable, all grouping variables, no grouping variables.

I don't know that we need to change anything within predictInterval(), simply adjust REimpact to take a new argument about which grouping variables to set to zero/not set to zero and then combining various results from a call to predictInterval(..., which = "all").

Does that make sense?

Or we could always tweak the which argument of predictInterval to allow users to specify multiple RE terms and that would add value to both.

jknowles commented 7 years ago

Yes @carlbfrederick your approach is what I was thinking we need to do. We also need to then document this feature in the predictInterval() documentation so users know how to use that feature.

Any chance you can take a stab at it? No problem if you can't -- just planning my work the next few weeks!

CJustice77 commented 7 years ago

It sounds like you guys are actively working on implementing the functionality I'm looking for with predictInterval(). Please let me know if and when it is available for use. Thanks kindly.

Casey Justice jusc@critfc.org

jknowles commented 6 years ago

I've got a draft idea of this in the inst subfolder - it is almost working. Performance is not optimal and I am not sure what the final results should look like but the REmargins function should get wrapped up sometime soon to allow this functionality more easily.

jknowles commented 4 years ago

This was fixed with the most recent release, there is now an reMargins function that tackles this. Thanks for your reports @CJustice77 and @bachlaw