vincentarelbundock / modelsummary

Beautiful and customizable model summaries in R.
http://modelsummary.com
Other
893 stars 73 forks source link

[Feature request] Add ability to show the label of a factor variable on a seperate row and also show the base level of factors. #770

Closed ami-null closed 1 month ago

ami-null commented 1 month ago

Is it possible to show the variable name of factor variables on a separate row and also show the base level of factor variables (output from {gtsummary} added for reference)?

mtcars$gear <- factor(mtcars$gear)

attributes(mtcars$mpg)$label <- "Miles per gallon"
attributes(mtcars$hp)$label <- "Horsepower"
attributes(mtcars$wt)$label <- "Weight"
attributes(mtcars$gear)$label <- "Gears"

mod1 <- lm(mpg ~ hp + wt + gear, data = mtcars)
modelsummary::modelsummary(
    mod1,
    statistic = c("95% CI" = "({conf.low}, {conf.high})", "p-value" = "{p.value}"),
    shape = term ~ - model + statistic,
    gof_map = NA,
    coef_rename = T
)
Est. 95% CI p-value
(Intercept) 34.872 (29.578, 40.166) \<0.001
Horsepower -0.035 (-0.061, -0.009) 0.010
Weight -3.239 (-5.040, -1.437) 0.001
Gears [4] 1.265 (-1.486, 4.016) 0.354
Gears [5] 1.874 (-1.956, 5.704) 0.324
gtsummary::tbl_regression(mod1)
Characteristic Beta 95% CI1 p-value
Horsepower -0.03 -0.06, -0.01 0.010
Weight -3.2 -5.0, -1.4 0.001
Gears


    3
    4 1.3 -1.5, 4.0 0.4
    5 1.9 -2.0, 5.7 0.3
1 CI = Confidence Interval

Created on 2024-05-23 with reprex v2.1.0

vincentarelbundock commented 1 month ago

Maybe something like this:

library(modelsummary)
mtcars$gear <- factor(mtcars$gear)

cr <- c(
    "mpg" = "Miles per gallon",
    "hp" = "Horsepower",
    "wt" = "Weight",
    "gear3" = "3",
    "gear4" = "4",
    "gear5" = "5"
)
ar <- data.frame(c("Gears", "3"), c("", "-"), c("", "-"))
attr(ar, "position") <- 4:5 

mod1 <- lm(mpg ~ hp + wt + gear, data = mtcars)

modelsummary(mod1, 
    coef_rename = cr, 
    add_rows = ar,
    gof_map = NA,
    shape = term ~ statistic)

+-------------+--------+-------+
|             | Est.   | S.E.  |
+=============+========+=======+
| (Intercept) | 34.872 | 2.580 |
+-------------+--------+-------+
| Horsepower  | -0.035 | 0.013 |
+-------------+--------+-------+
| Weight      | -3.239 | 0.878 |
+-------------+--------+-------+
| Gears       |        |       |
+-------------+--------+-------+
| 3           | -      | -     |
+-------------+--------+-------+
| 4           | 1.265  | 1.341 |
+-------------+--------+-------+
| 5           | 1.874  | 1.867 |
+-------------+--------+-------+ 
vincentarelbundock commented 1 month ago

Alternatively, you can use the group_tt() argument from the tinytable package:

library(modelsummary)
library(tinytable)
mtcars$gear <- factor(mtcars$gear)
mod1 <- lm(mpg ~ hp + wt + gear, data = mtcars)

cr <- c(
    "mpg" = "Miles per gallon",
    "hp" = "Horsepower",
    "wt" = "Weight",
    "gear4" = "4",
    "gear5" = "5"
)

modelsummary(mod1, 
    coef_rename = cr, 
    gof_map = NA,
    shape = term ~ statistic) |> 
    group_tt(i = list("Gears" = 4)) |>
    group_tt(i = list("3" = 5))
ami-null commented 1 month ago

Maybe something like this:

library(modelsummary)
mtcars$gear <- factor(mtcars$gear)

cr <- c(
    "mpg" = "Miles per gallon",
    "hp" = "Horsepower",
    "wt" = "Weight",
    "gear3" = "3",
    "gear4" = "4",
    "gear5" = "5"
)
ar <- data.frame(c("Gears", "3"), c("", "-"), c("", "-"))
attr(ar, "position") <- 4:5 

mod1 <- lm(mpg ~ hp + wt + gear, data = mtcars)

modelsummary(mod1, 
    coef_rename = cr, 
    add_rows = ar,
    gof_map = NA,
    shape = term ~ statistic)

+-------------+--------+-------+
|             | Est.   | S.E.  |
+=============+========+=======+
| (Intercept) | 34.872 | 2.580 |
+-------------+--------+-------+
| Horsepower  | -0.035 | 0.013 |
+-------------+--------+-------+
| Weight      | -3.239 | 0.878 |
+-------------+--------+-------+
| Gears       |        |       |
+-------------+--------+-------+
| 3           | -      | -     |
+-------------+--------+-------+
| 4           | 1.265  | 1.341 |
+-------------+--------+-------+
| 5           | 1.874  | 1.867 |
+-------------+--------+-------+ 

Thank you for the reply. This seems to be what I was looking for. Is it possible to add this functionality into the package itself? I would send a PR if I could, but unfortunately I am not familiar enough with the codebase

vincentarelbundock commented 1 month ago

In a sense, this functionality is already "in the package itself", as tinytable is a hard dependency. I think this is really trivial to achieve, so I don't think it's worth adding a new argument for this.

But I do agree that it should be better documented. I added an even simpler solution to the website. You can see it here:

https://modelsummary.com/vignettes/modelsummary.html#factor-labels

This is what I recommend.

ami-null commented 1 month ago

Thank you, the example in the provided in the works well.

However, when indent is non-zero in tinytable::group_tt(), the other variables (which are not under the group) are indented too which looks quite odd in a model summary table. I understand that this is the expected behavior and the functionality is provided by the {tinytable} package, however, I feel like it needed to be mentioned.

For example the below code:

library(modelsummary)
library(tinytable)

mod <- lm(mpg ~ hp + factor(gear) + factor(am), mtcars)

cr <- c(
    "hp" = "Horsepower",
    "factor(gear)3" = "3",
    "factor(gear)4" = "4",
    "factor(gear)5" = "5",
    "factor(am)0" = "Automatic",
    "factor(am)1" = "Manual"
)

modelsummary(mod,
             include_reference = TRUE,
             shape = term ~ statistic - model,
             coef_rename = cr,
             gof_map = NA) |>
    format_tt(replace = list("-" = "")) |>
    group_tt(i = list("Gears" = 3, "Transmission" = 6), indent = 2)

outputs this: Rplot

Ideally, the red marked rows should not be indented.

vincentarelbundock commented 1 month ago

You can set indent to 0 and the manually indent the rows you like with style_tt()

ami-null commented 1 month ago

Thank you, that works perfectly for HTML documents!

But, there still seem to be some issue for LaTeX output. For example, the code below:

library(modelsummary)
library(tinytable)

mod <- lm(mpg ~ hp + factor(gear) + factor(am), mtcars)

cr <- c(
    "hp" = "Horsepower",
    "factor(gear)3" = "3",
    "factor(gear)4" = "4",
    "factor(gear)5" = "5",
    "factor(am)0" = "Automatic",
    "factor(am)1" = "Manual"
)

modelsummary(
    mod,
    include_reference = TRUE,
    shape = term ~ statistic - model,
    coef_rename = cr,
    gof_map = NA
) |>
    format_tt(replace = list("-" = "")) |>
    group_tt(i = list("Gears" = 3, "Transmission" = 6), indent = 0) |> 
    style_tt(i = c(1:3, 7), j = 1, bold = T) |> 
    style_tt(i = c(4:6, 8:9), indent = 2)

produces this output in the case HTML output in an rmd document: image

However, the indent argument of style_tt() seems to be ignored in the case of LaTeX output. It produces: image and the generated LaTeX code for the table is:

\begin{table}
\centering
\begin{tblr}[         %% tabularray outer open
]                     %% tabularray outer close
{                     %% tabularray inner open
colspec={Q[]Q[]Q[]},
cell{4}{1}={c=3}{},cell{8}{1}={c=3}{},
cell{2}{1}={preto={\hspace{0em}}},
cell{3}{1}={preto={\hspace{0em}}},
cell{5}{1}={preto={\hspace{0em}}},
cell{6}{1}={preto={\hspace{0em}}},
cell{7}{1}={preto={\hspace{0em}}},
cell{9}{1}={preto={\hspace{0em}}},
cell{10}{1}={preto={\hspace{0em}}},
cell{11}{1}={preto={\hspace{0em}}},
column{1}={halign=l,},
column{2}={halign=c,},
column{3}={halign=c,},
cell{2}{1}={}{,cmd=\bfseries,},
cell{3}{1}={}{,cmd=\bfseries,},
cell{4}{1}={}{,cmd=\bfseries,},
cell{8}{1}={}{,cmd=\bfseries,},
}                     %% tabularray inner close
\toprule
& Est. & S.E. \\ \midrule %% TinyTableHeader
(Intercept) & \num{27.480} & \num{1.974} \\
Horsepower & \num{-0.065} & \num{0.010} \\
Gears && \\
3 & - & - \\
4 & \num{0.076} & \num{1.829} \\
5 & \num{2.395} & \num{2.384} \\
Transmission && \\
Automatic & - & - \\
Manual & \num{4.135} & \num{1.809} \\
\bottomrule
\end{tblr}
\end{table}

Is this an issue with {tinytable}? Or, am I missing something and this is the expected behavior?

vincentarelbundock commented 1 month ago

Thanks for the report. Yes, this was a minor bug (typo) in tinytable. If you install that package from Github, then restart R completely, it should work.