jacob-long / jtools

Tools for summarizing/visualizing regressions and other helpful stuff
https://jtools.jacob-long.com
GNU General Public License v3.0
165 stars 22 forks source link

summ() and standardization in longitudinal models #136

Closed tci1 closed 11 months ago

tci1 commented 1 year ago

I've been using your amazing summ() function. The documentation notes that it supports merMod objects and that it can rescale using Gelman’s 2 SD standardization method(n.sd = 2).

I have two sets of questions:

1) I was wondering how summ() does the 2-SD rescaling. How does summ() treat time-invariant variables? What about binary variables (both those that are time-varying and those that are time-invariant)? My understanding (which could be wrong) is that binary variables shouldn't be rescaled. Apologies if you already mention this in documentation. I didn't find it.

2) I came across cautions about standardization in longitudinal models (link below). Standardization changes in often undesirable ways the distances between observations, and the multivariate distributions of cross-sectional and longitudinal data. The below article recommends monotonous scale transformations to get items with different response scales to the same metric. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4569815/ So using summ() and 2-SD rescaling with a lmer() model looks problematic. My question: If you agree this is an issue, are there plans to add monotonous scale transformations to summ()?

I'm relatively new to longitudinal modeling, so I hope nothing I'm asking here is a bad question.

Thanks for your great packages!

Sincerely, Sam

jacob-long commented 1 year ago

Hi Sam, I'd say the best way to summarize how the scaling is done is that it does it in a way that is completely ignorant to the longitudinal format. It figures out what data you gave to the model fitting function and then goes column by column calculating SD/mean treating every observation equally. In the terms of the linked article, as best as I can tell, this package is doing "standardization across individuals across time points." I agree with the author of the linked article that basically all sorts of scaling are dangerous to the proper interpretation of longitudinal models.

I do have an open issue about POMP scaling (#33) but have not yet implemented it. I'm not sure whether/how I'd add the option to summ() since I've already made the function so very complicated. I can say with some more confidence that it's unlikely I'll add anything to jtools that takes longitudinal/multilevel structure into account for an operation like variable rescaling.