yihui / knitr

A general-purpose tool for dynamic report generation in R
https://yihui.org/knitr/
2.36k stars 873 forks source link

Figure code may generate wrong LaTeX code (when rendering Rmd to tex) if the fig.cap is not TeX-safe #2302

Closed s-u closed 8 months ago

s-u commented 8 months ago

Reproducible and illustrative example (test.Rmd):

---
title: test
---

```{r out.width="100%", fig.cap = "10%"}
plot(1)

then render with `rmarkdown::render("test.Rmd","pdf_document")`.

The figure output will be offset and followed with verbatim text `\begin{figure}` in the PDF document.

## Analysis:

When special options are needed in `hook_plot_md` for LaTeX output (here triggered by `out.width`), it will call `hook_plot_tex` and pass its output verbatim into the `.md` file - here (`test.knit.md`):
plot(1)

\begin{figure} \includegraphics[width=1\linewidth]{test_files/figure-latex/unnamed-chunk-1-1} \caption{10%}\label{fig:unnamed-chunk-1} \end{figure}


That works in most cases, because pandoc will detect valid LaTeX and simply copy that content. However, in the above case, the caption is not TeX-safe (the `%` needs escaping) so pandoc will detect invalid LaTeX and treat it as simple text - escaping all the commands - (from `test.tex`):

\textbackslash begin{figure} \includegraphics[width=1\linewidth]{test_files/figure-latex/unnamed-chunk-1-1} \textbackslash caption{10\%}\label{fig:unnamed-chunk-1} \textbackslash end{figure}



The cause is really hard to detect. The user may not be aware that the caption contained an unescaped `%` sign that caused the problem. 

## Solutions

There are really two related issues with possible solutions:
 - `fig.cap` could possibly escape naked `%` characters when producing LaTeX code for the figure (partly to make it both html-target and PDF-target consistent). This would at least avoid the arguably most common cause of this problem (which is really hard to debug!).
 - `hook_plot_md` might wrap the LaTeX code from `hook_plot_tex` in a `=latex` block to prevent pandoc from falling back to regular text rendering. This will make the LaTeX processing fail, but at least it is clear what happened at that point.

Note that pandoc may fail on complex LaTeX code beyond the simple issue above, so one might argue that using `=latex` bock is a way to go.

---

By filing an issue to this repo, I promise that

- [X] I have fully read the issue guide at https://yihui.org/issue/.
- [X] I have provided the necessary information about my issue.
- [X] I have learned the Github Markdown syntax, and formatted my issue correctly.

I understand that my issue may be closed if I don't fulfill my promises.
yihui commented 8 months ago

I've adopted the second solution you proposed. Thanks!

s-u commented 8 months ago

Thanks for your prompt reply! Much appreciated! I can confirm that this fixes the issue (LaTeX will correctly fail with "runaway argument" for the above example).

yihui commented 8 months ago

Sorry, I just discovered that the above fix would break some reverse dependencies. I'll investigate and may consider switching to the first solution (but one tricky thing is to avoid escaping % that has already been escaped by the author).

s-u commented 8 months ago

What about gsub("(?<!\\\\)%", "\\\\%", "cap 10% but not 90\\%", perl=TRUE)?

yihui commented 8 months ago

Yes, that's what I was thinking.

yihui commented 8 months ago

Just committed the new fix. Thanks!

github-actions[bot] commented 2 months ago

This old thread has been automatically locked. If you think you have found something related to this, please open a new issue by following the issue guide (https://yihui.org/issue/), and link to this old issue if necessary.