Closed vincentarelbundock closed 3 years ago
Thanks for turning this into an issue! In my opinion (but I may be biased) the minus signs should "just work" and not just upon an additional request by the user. So for LaTeX the question is whether you want to depend on dcolumn
for this (like memisc
does, which used to be my go-to package for model summaries) - or whether you want to add the math mode with $...$.
For HTML you could use unicode like pandoc does or HTML entities like TtH does, i.e., −
or −
or −
. Using MathML would also look nice provided the browser supports MathML but this is less likely. However, getting the nicer math rendering for all numbers in HTML would be a a nice option (but doesn't need to be the default, e.g., via MathML of MathJax et.c.
Thanks for these comments. Very useful.
A few notes to self. Just writing here so I don't forget; not necessarily expecting feedback/conversation.
$$
. The easiest place to do that is probably the rounding
function. Perhaps this can be triggered/disabled by a global option instead of a new argument?$$
should be the default. For example, when someone uses something else than Computer Modern for the body of their article, numbers wrapped in $$ in the tables might look inconsistent with the other faces in the paper.$$
, kableExtra
automatically inserts a <script>
tag with MathJax, so no problem when tables are read interactively, rendered in RStudio, etc.$$
and going to raw HTML with output="html"
. Perhaps that's not a big deal, as long the $$
strategy is optional.$$
works in LaTeX, of course, but one problem is that kableExtra
requires escape=FALSE
when there's math mode. This would need to be triggered automatically, I suppose. Unfortunately, stopping escaping might produce unexpected (bad) results if, for example, some variable names have underscores. Underscores are pretty common in variables, so I don't think we can include math mode with $$ and no escape by default. Otherwise, a ton of tables won't compile.gsub("-", "&minus", x)
when the tables goes to HTML? What's a safe analogous substitution for LaTeX? And Markdown/ASCII?dcolumn
to enforce math mode by default because (a) it is not loaded by default when using rmarkdown/knitr
, and (b) because I don't have a mechanism (yet) to wrap text in an \mbox{}
to avoid them looking like weird italic (e.g., Standard error labels that are added automatically with vcov
).library(kableExtra)
set.seed(1024)
dat = data.frame(a = rnorm(3), b = runif(3))
dat$a = sprintf("$%s$", dat$a)
dat$b = sprintf("$%s$", dat$b)
kbl(dat, "html")
Also possible is:
modelsummary(mod, align = ".")
which is read as "align on the dot", and would automatically use dcolumn
in LaTeX. AFAICT, there is no such option for HTML tables, unfortunately.
Just some quick feedback on a few items from the list above:
$-$1.23
.Yes to all this.
On this point:
* _Minus replacement:_ A quick and dirty solution would be to put only the minus in math mode, e.g. `$-$1.23`.
The main issue is that kableExtra
needs to be called with escape=FALSE
for $$
math mode to render, otherwise backslashes will be inserted. But escape=FALSE
means that all models with underscores in variable names will break LaTeX compilation.
Need to think more about this.
Ah, now I get it, sorry for being so slow! So either you have escape for everything (including _
) or nothing. I'll also try to think more...
@zeileis I thought about this and I now hold the opinion that given the escape
issue, it is probably best not to do math mode by default. However, I was able to implement something that makes it very easy for users to request it using a new S
column type in the align
argument.
This code should work automatically for both HTML and PDF output, and compile seamlessly in Rmarkdown:
library(modelsummary)
mod <- lm(mpg ~ hp, mtcars)
modelsummary(mod, align = "lS")
The S
name comes from the LaTeX siunitx
package, which seems to be a more modern version of dcolumn
with nice additional features. In LaTeX, we use siunitx
and set kableExtra::kbl(escape=FALSE)
automatically. An appropriate require for siunitx
is also added to the Rmarkdown preamble automatically (no user intervention).
In HTML, numerical values in S
columns are automatically wrapped in $$
. It turns out that math mode is properly rendered with MathJax by kableExtra
, even with escape=TRUE
, so that’s not an issue here.
An informative error is raised when users try math mode with a table factory other than the default kableExtra
:
modelsummary(mod, output = "gt", align = "lS")
#> Error in modelsummary(mod, output = "gt", align = "lS"): Math mode `align` is only supported for HTML or LaTeX tables produced by the `kableExtra` package.
An informative error is raised when users try to escape a LaTeX table with a math mode column:
modelsummary(mod, escape = TRUE, output = "latex", align = "lS")
#> Error in f(tab, align = align, hrule = hrule, notes = notes, output_file = output_list$output_file, : Cannot use `escape=TRUE` with "S" in the `align` argument for LaTeX/PDF output.
The documentation of the align
argument now looks like this:
align: A string with a number of characters equal to the number of columns in the table (e.g., ‘align = “lcc”’). Valid characters: l, c, r, S.
• "l": left-aligned column • "c": centered column • "r": right-aligned column • "S": math-mode column. In HTML tables, numeric values are centered and wrapped in $$ in order to be interpreted by MathJax. In LaTeX and PDF documents, numeric values are aligned on the dot and treated as math mode, using the "S" column type supplied by the ‘siunitx’ LaTeX package. This code must appear in the LaTeX document preamble (it is added automatically when compiling Rmarkdown documents): \usepackage[parse-numbers=false]{siunitx} Warning: When using ‘siunitx’ for math mode tables in LaTeX, characters like underscores in variable names will _not_ be escaped automatically, and may break compilation.
Thanks @vincentarelbundock for this. A dedicated improved column type is certainly useful. However, I think:
So for LaTeX output I think adding math markup like $-0.052$
should be added for numbers by default, independent of the column type.
For HTML output I took your example mod <- lm(mpg ~ hp + drat, mtcars)
and ran modelsummary(mod)
and then added MathML markup to the result. The resulting .html file (called .txt so that GitHub let's me attach it here) is: mtcars.txt. Essentially, I've added <math>
tags for everything and <mn>
for numbers and <mo>
for operators like (
and -
. And I replaced -
with −
. So the hp coefficient is encoded as: <math display="inline" xmlns="http://www.w3.org/1998/Math/MathML"><mo>−</mo><mn>0.052</mn></math>
.
In the table below I show how this is rendered in Chromium and in Firefox (on Debian GNU/Linux) with the MathJax <script>
included (default) or without. With MathJax included both displays look essentially identical. Without MathJax, Chromium falls back to rendering as plain text while in Firefox the native MathML support kicks in. Safari would likely be similar to Firefox.
Browser | With MathJax | Without MathJax |
---|---|---|
Chromium | ||
Firefox |
Thus, this would give relatively similar results to the $...$
markup in LaTeX. The main difference in my view is that in LaTeX it is reasonably straightforward to control the font used for the math mode. I don't whether something similar is possible in HTML. I could imagine that there are users who would prefer the plain text version as in Chromium without MathJax while others would prefer one of the other settings. But as longs as minus signs are encoded as −
my personal feeling is that all versions look sufficiently ok.
I forgot to say: The mtcars.txt
that I uploaded in my comment above is without the MathJax script. If you want to enable it, you need to include <script type="text/x-mathjax-config">MathJax.Hub.Config({tex2jax: {inlineMath: [["$","$"]]}})</script><script async src="https://mathjax.rstudio.com/latest/MathJax.js?config=TeX-AMS-MML_HTMLorMML"></script>
again.
Thanks for this. Super useful!
gsub
-ing "-"
for −
Notes to self about remaining challenges:
Challenge 1: Escape
$$
in LaTeX and −
in HTML require kable(escape=FALSE)
. This will break compilation of tables with underscores in variable or model names. This is a show-stopper.kable
escaping?escape
because of name conflict.Challenge 2: Glue strings
estimate
and statistic
arguments accept arbitrary user-supplied glue strings.
modelsummary
with this: statistic = "95% Conf.Int. = [{conf.low}, {conf.high}]"
$$
or expect the user to do it? (Probably latter)gsub
minus signs or expect the user to do it? (Probably latter)Challenge 3: Implementation
format_gof
and the datasummary_*
family use the rounding
function directlyformat_estimates
uses the rounding
function and then feeds the result to glue_data
. But if the glue strings include ()[]
or other math characters, those also need to be in math mode. $$
in both rounding
and glue
, then we'll end up with ugly (but working) results like: $($$-0.052)$$)$
Reminders for VAB:
S
-column type is used modelsummary
we can already get close to our goal with the code below. However, goodness-of-fit statistics are not properly formatted there, and this won't help us with datasummary_*
library(modelsummary)
mod <- lm(mpg ~ hp + drat, mtcars)
modelsummary(mod,
estimate = "${estimate}$",
statistic = "${std.error}$",
escape = FALSE
)
Model 1 | |
---|---|
(Intercept) | 10.790 |
5.078 | |
hp | − 0.052 |
0.009 | |
drat | 4.698 |
1.192 | |
Num.Obs. | 32 |
R2 | 0.741 |
R2 Adj. | 0.723 |
AIC | 169.5 |
BIC | 175.4 |
Log.Lik. | -80.752 |
F | 41.522 |
Thanks for the follow-up. Just wanted to make one quick comment regarding the mixed fonts in HTML: I'm not a fan either but got used to it because this is the predominant kind of display I get when looking at online articles with mathematical equations. See for example https://doi.org/10.1080/01621459.2021.1891927 and note also that you have the option to turn MathJax on and off (at the top right above the title). Interestingly, they don't use math mode for the tables but still manage to show the minuses ok (e.g., see Table 5 ). So maybe there is another trick you can steal?
Yet another good solution for LaTeX might be to use the siunitx
package (which I already use for S
-columns) and to wrap table numbers in \num{}
. The advantage is that we get properly formatted numbers in text mode, so that the numbers font matches the body of the text, the term labels, the caption, and the notes.
The Rmarkdown code below produces a document in Times New Roman with this matching table:
---
output:
pdf_document:
latex_engine: xelatex
mainfont: Times New Roman
---
```{r}
library(modelsummary)
kableExtra::usepackage_latex("siunitx")
mod = list(
lm(mpg ~ hp + drat, mtcars),
lm(mpg ~ hp + drat + vs, mtcars))
siunitx <- function(x) sprintf("\\num{%s}", x)
modelsummary(mod,
estimate = "{siunitx(estimate)}",
title = "A Times New Roman table.",
escape = FALSE)
I'm back from vacation and implemented a first version that passes the current test suite.
−
.
$$
and use MathJax.siunitx
LaTeX package, wrapping all numbers in \num{}
by default.
$$
and use LaTeX math mode.\usepackage{siunitx}
is added to the preamble automatically when compiling Rmarkdown documents.align = "lddd"
means that the first column is left-aligned and the others are aligned on the dot using siunitx
(LaTeX-only).align
argument.Since this thread is very general and the main functions are implemented, I'm closing this now.
Feel free to re-open this or open more narrowly-focused issues as needed.
I paste an example Rmarkdown document and some screenshots below.
---
output:
pdf_document:
latex_engine: xelatex
mainfont: Times New Roman
---
```{r}
library(modelsummary)
dat = mtcars
dat$mpg = dat$mpg * -1
dat$hp = dat$hp / 1e6
mod = list(
lm(mpg ~ hp + drat, dat),
lm(mpg ~ hp + drat + vs, dat))
modelsummary(mod,
title = "Center-align with siunitx (default).")
modelsummary(mod,
align = "ldd",
title = "Dot-align with siunitx, using the align argument.")
datasummary(hp + drat + mpg ~ Factor(am) * (Mean + SD), data = dat)
datasummary_skim(dat)
<img width="300" alt="Screen Shot 2021-08-09 at 19 34 31" src="https://user-images.githubusercontent.com/987057/128788125-bf680cd6-4ba5-4627-938a-bc0021990ce1.png">
<img width="300" alt="Screen Shot 2021-08-09 at 19 34 44" src="https://user-images.githubusercontent.com/987057/128788124-afab6ef3-c60a-4822-8d89-a4852af13eed.png">
<img width="245" alt="Screen Shot 2021-08-09 at 19 34 48" src="https://user-images.githubusercontent.com/987057/128788123-734b5420-6d11-491e-9324-30cb2c4f8a36.png">
<img width="526" alt="Screen Shot 2021-08-09 at 19 34 52" src="https://user-images.githubusercontent.com/987057/128788121-e0ab3733-18c2-4b96-bc1e-e555f4b9717e.png">
<img width="300" alt="Screen Shot 2021-08-09 at 19 35 24" src="https://user-images.githubusercontent.com/987057/128788119-5a9da9a1-9fd4-4d6e-a1d1-76aaa2a4b83c.png">
I am having problems with getting Times New Roman as the siunitx font for the tables.
---
format: pdf
mainfont: Times New Roman
sansfont: Times New Roman
execute:
echo: false
---
library(modelsummary)
models <- list()
models[[1]] <- lm(mpg ~ hp, mtcars)
models[[2]] <- lm(mpg ~ hp + cyl, mtcars)
modelsummary(
models, stars=TRUE, gof_map=c("nobs", "adj.r.squared"), align="ldd"
)
What am I doing wrong?
Thank you very much in advance!
This is not a modelsummary
issue. There are other forums where you can ask how to change the math font in siunitx
.
To clarify, I don't know the solution off-hand or I would have given it, of course...
Okay, sorry! I thought its about how modelsummary
uses the \num
- I am going to look somewhere else.
Ideas:
dcolumn
example in vignetteSome of these are probably unsafe.