tjmahr / tjmahr.github.io

A responsive Jekyll theme with clean typography and support for large full page images.
https://tjmahr.github.io
MIT License
9 stars 8 forks source link

plotting-partial-pooling-in-mixed-effects-models/ #11

Open utterances-bot opened 1 year ago

utterances-bot commented 1 year ago

Plotting partial pooling in mixed-effects models - Higher Order Functions

There are a lot of bilabial sounds in that title

https://tjmahr.github.io/plotting-partial-pooling-in-mixed-effects-models/

daaronr commented 1 year ago

This is super-helpful, thanks. I left some comment/questions using hypothes.is. But here are some key followup questions ... if anyone gets around to reading these comments.

  1. What distribution does the frequentist ML mixed effects model assume ofr the random slopes and intercepts?

  2. How distinct is this from the ‘regularization with cross-validation’ that we see in Machine learning approaches? E.g., I could do a ridge model where I allow only the coefficient on reader to be regularized; this also leads to the same sort of ‘shrinkage’ … so what’s the difference?

  3. Thinking by analogy to a Bayesian approach, what does it mean that we assume the intercept is a “random deviations drawn from a distribution”? Isn’t that what we always assume, for each parameter in a Bayesian model … so then, what would it mean for a Bayesian model to have a fixed (vs random) coefficient? ... (related, in your Bayesian model, where are the priors on the slopes given?)

  4. Why wouldn’t we always want all our parameters to be random effects? Why include any fixed effects … considering general ideas of overfitting and effects as draws from larger distributions?

  5. (Related to 3, I think) What is the impact of the choice of giving one feature a ‘random intercept only’ on the estimates of the other coefficients?

My thinking on the last point , getting back to an earlier discussion, is that by modeling the effect of reader as a random effect, thus shrinking it relative to the standard linear models would mean that the problem of ‘omitted variable bias’ in other coefficients (e.g, on ‘condition’) could remain. This could be a problem if reader is not orthogonal to condition (~if they are correlated to one another). This also may come down to the question of ‘do we care mainly about interpreting and assessing a particular coefficient’ or ‘do we care mainly about a predictive model overall’?

tjmahr commented 1 year ago

What distribution does the frequentist ML mixed effects model assume ofr the random slopes and intercepts?

A normal distribution. See equation 3 on page 3 here. https://www.jstatsoft.org/article/view/v067i01

How distinct is this from the ‘regularization with cross-validation’ that we see in Machine learning approaches? E.g., I could do a ridge model where I allow only the coefficient on reader to be regularized; this also leads to the same sort of ‘shrinkage’ … so what’s the difference?

I am unfamiliar with how cross-validation affects regularization. I have another post where I compare the shrinkage from linear mixed models to the smoothing achieved by penalized splines. You can convert the penalized smoothing model into a mixed effects model so that the smoothing hyperparameter is learned from the data. https://www.tjmahr.com/random-effects-penalized-splines-same-thing/

Why wouldn’t we always want all our parameters to be random effects? Why include any fixed effects … considering general ideas of overfitting and effects as draws from larger distributions?

I don't have a good answer. This seems like a good article for a discussion for using random effects as smoothers. http://www.biostat.umn.edu/~hodges/PubH8492/Hodges-ClaytonREONsubToStatSci.pdf

(Related to 3, I think) What is the impact of the choice of giving one feature a ‘random intercept only’ on the estimates of the other coefficients?

I'm not sure. Usually fixed effect estimates are similar to between models with and without random intercepts. (So like a model of score ~ age and score ~ age + (1 | subject) should have similar coefficients for age if each subject contributes a similar number of observations. The random intercept is like an error term in this case adjusting a population mean for each subject.) IIRC Hodges has a case in his book where adding a random effect knocks out an unrelated fixed effect because of something to do with cluster sizes.

JoFrhwld commented 1 year ago

Great post! I’ll include it as a supplemental reading for the stats course I’m teaching right now!

daniela-palleschi commented 5 months ago

Returning to this great post as I prep some teaching materials, and using the code to produce similar plots for my slides. What a great resource for researchers, instructors, and students!