jacob-long / jtools

Tools for summarizing/visualizing regressions and other helpful stuff
https://jtools.jacob-long.com
GNU General Public License v3.0
165 stars 22 forks source link

Categorical predictor names in tables differ when scale=TRUE versus FALSE #114

Closed e-leib closed 2 years ago

e-leib commented 3 years ago

I was using scale=TRUE with export_summs(), and I was using the argument coefs to rename my predictors in the table. When I decided to remove scale=TRUE, my code threw this error: Error in (function (..., error_format = "({std.error})", error_pos = c("below", : Unrecognized coefficient names:...

After checking what the model output looked like without renaming the coefficients, I found that the naming convention for categorical predictors is different when scale=TRUE versus FALSE.

Here is a minimal reproducible example:

library(tidyverse)
library(jtools)

x <- rnorm(50)
y <- rnorm(50, mean = 20)
group <- factor(rep(c("GroupA", "GroupB"), length.out = 50))

df <- data.frame(x, y, group)

mod <- lm(y ~ x * group)

Let's make our summary tables now. First, the default, when scale = FALSE: summ(mod)

Output:

--------------------------------------------------
                     Est.   S.E.   t val.      p
------------------- ------- ------ -------- ------
(Intercept)               20.22   0.23    87.84   0.00
x                          -0.14   0.24    -0.60   0.55
groupGroupB           -0.32   0.33    -0.99   0.33
x:groupGroupB         -0.57   0.40    -1.44   0.16
--------------------------------------------------

Now, set scale = TRUE: summ(mod, scale = TRUE)

Output:

------------------------------------------------
                   Est.   S.E.   t val.      p
----------------- ------- ------ -------- ------
(Intercept)        20.22   0.23    87.92   0.00
x                  -0.12   0.21    -0.60   0.55
group               -0.33   0.33    -1.03   0.31
x:group             -0.49   0.34    -1.44   0.16
------------------------------------------------

You can see that when scale=FALSE, the categorical variable is listed as groupGroupB, and when it is TRUE it is just listed as group. This difference caused the errors in my code when I removed scale=TRUE. It seems like the way the predictors are named should be consistent and the same regardless of the scale argument.

Thank you!

jacob-long commented 2 years ago

Thanks for sharing. What's happening under the hood here is when the raw data are passed to gscale(), that function's default behavior converts binary factor variables into 0/1 numeric variables. Now that you mention it, I don't think that's a good default behavior. I've changed the defaults in my dev branch and the next release will work the way you are requesting.