yihui / knitr

A general-purpose tool for dynamic report generation in R
https://yihui.org/knitr/
2.38k stars 873 forks source link

Kable typesets negative numbers incorrectly in tables #1709

Open Selbosh opened 5 years ago

Selbosh commented 5 years ago

The current version of kable generates incorrectly typeset numeric columns, especially in LaTeX output, wherever they contain negative numbers.

Example

Consider this minimal example.

df <- data.frame(x = c('Foo', 'Bar', 'Baz')
                 y = c(1.2345, -6.5432, 1.0001))
kable(df, digits = 2)

The output is

|x   |     y|
|:---|-----:|
|Foo |  1.23|
|Bar | -6.54|
|Baz |  1.00|

which renders as follows here on GitHub:

x y
Foo 1.23
Bar -6.54
Baz 1.00

Negative numbers and minus signs

Looks OK, right? Well, not quite. The symbol before the 6 is a hyphen-minus (-, U+002D), not an explicit minus sign (−, U+2212).

In HTML output the difference is not so noticeable to some, but in LaTeX-PDF output it looks extremely dodgy as far as kerning is concerned, because the numbers are incorrectly typeset in text mode and therefore the '-' symbol is misinterpreted as a hyphen rather than a minus sign.

The correct output, at least for LaTeX, should be to wrap all numbers in in-line math tags, like $-123$ or \(-123\), as a hyphen-minus entered in math mode is interpreted and correctly rendered as a minus symbol, with the correct kerning.

Compare these two columns, where column y is rendered by the default kable settings from a numeric column, and column z has been wrapped in math tags.

|x   |     y|       z|
|:---|-----:|-------:|
|Foo |  1.23|  $1.23$|
|Bar | -6.54| $-6.54$|
|Baz |  1.00|  $1.00$|

Numeric columns in text mode and in math mode

Possible fix

One way to accomplish this might be to replace the line https://github.com/yihui/knitr/blob/00ffce24b08f79fc15e2b77309bc0b34a0def647/R/table.R#L145 with

if (is_numeric(x[, j])) x[, j] = sprintf(sprintf('$%%.%sf$', digits[j]), x[, j])`

however this may only appropriate for LaTeX output, as though it should look fine on HTML, it would be an unwelcome surprise for people not using MathJax to suddenly need it. Maybe we can make it an option, disabled by default?

Another solution

Slightly less heavy-handed: replace all hyphens in numeric columns with double hyphens, which should get typeset as en-dashes. Not strictly correct, and could cause kerning issues in LaTeX, but looks a bit better.

if (is_numeric(x[, j])) x[, j] = gsub('-', '--', round(x[, j], digits[j]))`

Workarounds

Of course one can pre-process the data before it goes into kable, but then the columns aren't numeric any more so you have to set the alignment manually. And I personally think good typography should be the default!

wikithink commented 5 years ago

run following code, your get some surprise:

cl <- data.frame(x=c('-','±'),y=c(20,50),z=c('normal','abnormal'))
knitr::kable(cl,booktabs = TRUE)
Selbosh commented 5 years ago

@wikithink I don't understand what you mean. The first symbol (a hyphen-minus) is rendered as a hyphen in text mode, and there is no second meaning for a plus-minus symbol.

Compare:


cl2 <- data.frame(x = c('$-$', '$\\pm$'), y = c(20, 50), z = c('minus', 'plus-minus'))
knitr::kable(cl2, booktabs = TRUE)
krivit commented 4 years ago

Seconding this ticket, it looks ugly in LaTeX/PDF mode in particular, not just in terms of the kerning but in terms of the length of the minus sign.

One way to accomplish this might be to replace the line ... however this may only appropriate for LaTeX output, as though it should look fine on HTML, it would be an unwelcome surprise for people not using MathJax to suddenly need it. Maybe we can make it an option, disabled by default?

As far as I know kable() can detect or be told explicitly whether it's running in HTML or in LaTeX mode, and it can then adjust its output accordingly.

krivit commented 4 years ago

On the other hand, math mode also causes space to be inserted after commas, which doesn't look good when using them as a decimal or thousands separator. There is a number of ways to remedy this:

\documentclass{article}
\usepackage{amsmath}
\usepackage[group-separator={,}]{siunitx}
\begin{document}
\noindent-1,000,000.00\\ % text mode default
$-1,000,000.00$\\ % math mode default
$-\text{1,000,000.00}$\\ % amsmath::\text
$-$1,000,000.00\\ % Math only minus
\num{-1000000.00}\\ % siunitx::\num
$-1{,}000{,}000{.}00$\\ % braces trick: see https://tex.stackexchange.com/questions/303110/avoid-space-after-commas-used-as-thousands-separator-in-math-mode
\end{document}

I don't know how much of a priority it is to avoid depending on external packages, but if it's a priority, the most robust approach is probably to take the number after formatting, then run a search-and-replace replacing each comma with {,}, and the best place to do so would probably be immediately after https://github.com/yihui/knitr/blob/12b50f5bb646acbaf7535c92b4f627ee42e44646/R/table.R#L149-L152

I've prototyped this, and would be happy to submit a PR.

krivit commented 4 years ago

Correction: the best place is probably after https://github.com/yihui/knitr/blob/12b50f5bb646acbaf7535c92b4f627ee42e44646/R/table.R#L256 to prevent the \(\) or $ from being escaped, tough it requires passing isn on to kable_latex.