vincentarelbundock / modelsummary

Beautiful and customizable model summaries in R.
http://modelsummary.com
Other
908 stars 75 forks source link

math mode #330

Closed vincentarelbundock closed 3 years ago

vincentarelbundock commented 3 years ago

Ideas:

Some of these are probably unsafe.

zeileis commented 3 years ago

Thanks for turning this into an issue! In my opinion (but I may be biased) the minus signs should "just work" and not just upon an additional request by the user. So for LaTeX the question is whether you want to depend on dcolumn for this (like memisc does, which used to be my go-to package for model summaries) - or whether you want to add the math mode with $...$.

For HTML you could use unicode like pandoc does or HTML entities like TtH does, i.e., − or − or −. Using MathML would also look nice provided the browser supports MathML but this is less likely. However, getting the nicer math rendering for all numbers in HTML would be a a nice option (but doesn't need to be the default, e.g., via MathML of MathJax et.c.

vincentarelbundock commented 3 years ago

Thanks for these comments. Very useful.

A few notes to self. Just writing here so I don't forget; not necessarily expecting feedback/conversation.

library(kableExtra)
set.seed(1024)
dat = data.frame(a = rnorm(3), b = runif(3))
dat$a = sprintf("$%s$", dat$a)
dat$b = sprintf("$%s$", dat$b)
kbl(dat, "html")
vincentarelbundock commented 3 years ago

Also possible is:

modelsummary(mod, align = ".")

which is read as "align on the dot", and would automatically use dcolumn in LaTeX. AFAICT, there is no such option for HTML tables, unfortunately.

zeileis commented 3 years ago

Just some quick feedback on a few items from the list above:

vincentarelbundock commented 3 years ago

Yes to all this.

On this point:

* _Minus replacement:_ A quick and dirty solution would be to put only the minus in math mode, e.g. `$-$1.23`.

The main issue is that kableExtra needs to be called with escape=FALSE for $$ math mode to render, otherwise backslashes will be inserted. But escape=FALSE means that all models with underscores in variable names will break LaTeX compilation.

Need to think more about this.

zeileis commented 3 years ago

Ah, now I get it, sorry for being so slow! So either you have escape for everything (including _) or nothing. I'll also try to think more...

vincentarelbundock commented 3 years ago

@zeileis I thought about this and I now hold the opinion that given the escape issue, it is probably best not to do math mode by default. However, I was able to implement something that makes it very easy for users to request it using a new S column type in the align argument.

This code should work automatically for both HTML and PDF output, and compile seamlessly in Rmarkdown:

library(modelsummary)
mod <- lm(mpg ~ hp, mtcars)
modelsummary(mod, align = "lS")
siunitx_latex Screen Shot 2021-07-28 at 14 43 48

The S name comes from the LaTeX siunitx package, which seems to be a more modern version of dcolumn with nice additional features. In LaTeX, we use siunitx and set kableExtra::kbl(escape=FALSE) automatically. An appropriate require for siunitx is also added to the Rmarkdown preamble automatically (no user intervention).

In HTML, numerical values in S columns are automatically wrapped in $$. It turns out that math mode is properly rendered with MathJax by kableExtra, even with escape=TRUE, so that’s not an issue here.

An informative error is raised when users try math mode with a table factory other than the default kableExtra:

modelsummary(mod, output = "gt", align = "lS")
#> Error in modelsummary(mod, output = "gt", align = "lS"): Math mode `align` is only supported for HTML or LaTeX tables produced by the `kableExtra` package.

An informative error is raised when users try to escape a LaTeX table with a math mode column:

modelsummary(mod, escape = TRUE, output = "latex", align = "lS")
#> Error in f(tab, align = align, hrule = hrule, notes = notes, output_file = output_list$output_file, : Cannot use `escape=TRUE` with "S" in the `align` argument for LaTeX/PDF output.

The documentation of the align argument now looks like this:

align: A string with a number of characters equal to the number of columns in the table (e.g., ‘align = “lcc”’). Valid characters: l, c, r, S.

       • "l": left-aligned column

       • "c": centered column

       • "r": right-aligned column

       • "S": math-mode column. In HTML tables, numeric values are
         centered and wrapped in $$ in order to be interpreted by
         MathJax. In LaTeX and PDF documents, numeric values are
         aligned on the dot and treated as math mode, using the
         "S" column type supplied by the ‘siunitx’ LaTeX package.
         This code must appear in the LaTeX document preamble (it
         is added automatically when compiling Rmarkdown
         documents): \usepackage[parse-numbers=false]{siunitx}
         Warning: When using ‘siunitx’ for math mode tables in
         LaTeX, characters like underscores in variable names will
         _not_ be escaped automatically, and may break
         compilation.
zeileis commented 3 years ago

Thanks @vincentarelbundock for this. A dedicated improved column type is certainly useful. However, I think:

  1. Proper rendering of minus signs should not be tied to the alignment in the column.
  2. At least for LaTeX/PDF output the user should not have to set extra arguments to get proper formatting of numbers in a table. Not having this turned on by default would lead to a lot of suboptimal output in papers and tedious work for those who care about it.

So for LaTeX output I think adding math markup like $-0.052$ should be added for numbers by default, independent of the column type.

For HTML output I took your example mod <- lm(mpg ~ hp + drat, mtcars) and ran modelsummary(mod) and then added MathML markup to the result. The resulting .html file (called .txt so that GitHub let's me attach it here) is: mtcars.txt. Essentially, I've added <math> tags for everything and <mn> for numbers and <mo> for operators like ( and -. And I replaced - with &minus;. So the hp coefficient is encoded as: <math display="inline" xmlns="http://www.w3.org/1998/Math/MathML"><mo>&minus;</mo><mn>0.052</mn></math>.

In the table below I show how this is rendered in Chromium and in Firefox (on Debian GNU/Linux) with the MathJax <script> included (default) or without. With MathJax included both displays look essentially identical. Without MathJax, Chromium falls back to rendering as plain text while in Firefox the native MathML support kicks in. Safari would likely be similar to Firefox.

Browser With MathJax Without MathJax
Chromium chromium-mathjax chromium-nomathjax
Firefox firefox-mathjax firefox-nomathjax

Thus, this would give relatively similar results to the $...$ markup in LaTeX. The main difference in my view is that in LaTeX it is reasonably straightforward to control the font used for the math mode. I don't whether something similar is possible in HTML. I could imagine that there are users who would prefer the plain text version as in Chromium without MathJax while others would prefer one of the other settings. But as longs as minus signs are encoded as &minus; my personal feeling is that all versions look sufficiently ok.

zeileis commented 3 years ago

I forgot to say: The mtcars.txt that I uploaded in my comment above is without the MathJax script. If you want to enable it, you need to include <script type="text/x-mathjax-config">MathJax.Hub.Config({tex2jax: {inlineMath: [["$","$"]]}})</script><script async src="https://mathjax.rstudio.com/latest/MathJax.js?config=TeX-AMS-MML_HTMLorMML"></script> again.

vincentarelbundock commented 3 years ago

Thanks for this. Super useful!

  1. I'm convinced by your argument that it is easy to control the math mode font in LaTeX. Proper display by default is important here, especially because LaTeX is so math-focused.
  2. Your HTML side-by-side screenshots are great. Frankly, I don't love the mixed-font look, and I can't think of an easy way to adjust fonts in HTML, so I lean toward gsub-ing "-" for &minus;
  3. I agree that math mode should not be tightly linked to align column type. S-type will only work in LaTeX because that's where it makes sense.

Notes to self about remaining challenges:

Challenge 1: Escape

Challenge 2: Glue strings

Challenge 3: Implementation

Reminders for VAB:

library(modelsummary)
mod <- lm(mpg ~ hp + drat, mtcars)
modelsummary(mod,
  estimate = "${estimate}$",
  statistic = "${std.error}$",
  escape = FALSE
)
Model 1
(Intercept) 10.790
5.078
hp  − 0.052
0.009
drat 4.698
1.192
Num.Obs. 32
R2 0.741
R2 Adj. 0.723
AIC 169.5
BIC 175.4
Log.Lik. -80.752
F 41.522
zeileis commented 3 years ago

Thanks for the follow-up. Just wanted to make one quick comment regarding the mixed fonts in HTML: I'm not a fan either but got used to it because this is the predominant kind of display I get when looking at online articles with mathematical equations. See for example https://doi.org/10.1080/01621459.2021.1891927 and note also that you have the option to turn MathJax on and off (at the top right above the title). Interestingly, they don't use math mode for the tables but still manage to show the minuses ok (e.g., see Table 5 ). So maybe there is another trick you can steal?

vincentarelbundock commented 3 years ago

Yet another good solution for LaTeX might be to use the siunitx package (which I already use for S-columns) and to wrap table numbers in \num{}. The advantage is that we get properly formatted numbers in text mode, so that the numbers font matches the body of the text, the term labels, the caption, and the notes.

The Rmarkdown code below produces a document in Times New Roman with this matching table:

Screen Shot 2021-08-01 at 09 01 57
---
output:
    pdf_document:
        latex_engine: xelatex
mainfont: Times New Roman
---

```{r}
library(modelsummary)
kableExtra::usepackage_latex("siunitx")

mod = list(
    lm(mpg ~ hp + drat, mtcars),
    lm(mpg ~ hp + drat + vs, mtcars))

siunitx <- function(x) sprintf("\\num{%s}", x)

modelsummary(mod,
             estimate = "{siunitx(estimate)}",
             title = "A Times New Roman table.",
             escape = FALSE)
vincentarelbundock commented 3 years ago

I'm back from vacation and implemented a first version that passes the current test suite.

Since this thread is very general and the main functions are implemented, I'm closing this now.

Feel free to re-open this or open more narrowly-focused issues as needed.

I paste an example Rmarkdown document and some screenshots below.

---
output:
    pdf_document:
        latex_engine: xelatex
mainfont: Times New Roman
---

```{r}
library(modelsummary)

dat = mtcars
dat$mpg = dat$mpg * -1
dat$hp = dat$hp / 1e6

mod = list(
  lm(mpg ~ hp + drat, dat),
  lm(mpg ~ hp + drat + vs, dat))
modelsummary(mod,
             title = "Center-align with siunitx (default).")
modelsummary(mod,
             align = "ldd",
             title = "Dot-align with siunitx, using the align argument.")
datasummary(hp + drat + mpg ~ Factor(am) * (Mean + SD), data = dat)
datasummary_skim(dat)


<img width="300" alt="Screen Shot 2021-08-09 at 19 34 31" src="https://user-images.githubusercontent.com/987057/128788125-bf680cd6-4ba5-4627-938a-bc0021990ce1.png">

<img width="300" alt="Screen Shot 2021-08-09 at 19 34 44" src="https://user-images.githubusercontent.com/987057/128788124-afab6ef3-c60a-4822-8d89-a4852af13eed.png">

<img width="245" alt="Screen Shot 2021-08-09 at 19 34 48" src="https://user-images.githubusercontent.com/987057/128788123-734b5420-6d11-491e-9324-30cb2c4f8a36.png">

<img width="526" alt="Screen Shot 2021-08-09 at 19 34 52" src="https://user-images.githubusercontent.com/987057/128788121-e0ab3733-18c2-4b96-bc1e-e555f4b9717e.png">

<img width="300" alt="Screen Shot 2021-08-09 at 19 35 24" src="https://user-images.githubusercontent.com/987057/128788119-5a9da9a1-9fd4-4d6e-a1d1-76aaa2a4b83c.png">
victor-234 commented 6 months ago

I am having problems with getting Times New Roman as the siunitx font for the tables.

---
format: pdf
mainfont: Times New Roman
sansfont: Times New Roman
execute: 
    echo: false
---
library(modelsummary)

models <- list()
models[[1]] <- lm(mpg ~ hp, mtcars)
models[[2]] <- lm(mpg ~ hp + cyl, mtcars)

modelsummary(
    models, stars=TRUE, gof_map=c("nobs", "adj.r.squared"), align="ldd"
)
Screenshot 2024-02-23 at 21 25 46

What am I doing wrong?

Thank you very much in advance!

vincentarelbundock commented 6 months ago

This is not a modelsummary issue. There are other forums where you can ask how to change the math font in siunitx.

vincentarelbundock commented 6 months ago

To clarify, I don't know the solution off-hand or I would have given it, of course...

victor-234 commented 6 months ago

Okay, sorry! I thought its about how modelsummary uses the \num - I am going to look somewhere else.