yihui / knitr

A general-purpose tool for dynamic report generation in R
https://yihui.org/knitr/
2.37k stars 874 forks source link

Multiple quotation marks on the same line get mishandled, in rmd files #2242

Closed timothee-bacri closed 1 year ago

timothee-bacri commented 1 year ago

I originally posted my issue at https://github.com/quarto-dev/quarto-cli/issues/4857, but I now think this is more likely a knitr problem.

I copy-paste here the important part from there.

Bug description

The following code in a .rmd (or .qmd) document

<!-- Problem -->
``abc" \(123\) ``def"

<!-- Problem -->
``abc"
``def"

<!-- No problem -->
``abc"

``def"

produces (when rendered with the R-code rmarkdown::render("mydocument.Rmd") (or quarto::quarto_render("mydocument.qmd")) the following TeX code

\texttt{abc"\ \textbackslash{}(123\textbackslash{})}def''

\texttt{abc"}def''

``abc''

``def''

and this in turn produces the following PDF Quarto TeX problem

I have no idea what is happening, but Overleaf does not have this problem, so I think this is a knitr problem.

Edit

Strangely, the .rnw code

\documentclass{article}
\begin{document}
% No problem
``abc" \(123\) ``def"

% No problem
``abc"
``def"

% No problem
``abc"

``def"
\end{document}

produces (via knitr::knit2pdf("mydocument.rnw")) a PDF with all correct quotation marks, without these strange errors.

Technical information

I have updated all my packages with update.packages(ask = FALSE, checkBuilt = TRUE), and run remotes::install_github('yihui/knitr').

> xfun::session_info('knitr')
R version 4.2.2 (2022-10-31 ucrt)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 19044), RStudio 2022.12.0.353

Locale:
  LC_COLLATE=English_Europe.utf8  LC_CTYPE=English_Europe.utf8    LC_MONETARY=English_Europe.utf8 LC_NUMERIC=C                   
  LC_TIME=English_Europe.utf8    

Package version:
  evaluate_0.20   graphics_4.2.2  grDevices_4.2.2 highr_0.10      knitr_1.42.5    methods_4.2.2   stats_4.2.2     tools_4.2.2    
  utils_4.2.2     xfun_0.37       yaml_2.3.7 

By filing an issue to this repo, I promise that

I understand that my issue may be closed if I don't fulfill my promises.

cderv commented 1 year ago

Hi @timothee-bacri

Not sure to understand what the issue is really. It seems to me get what you are writing , but let me explain what happens as maybe the context is missing to you and this will be more clear.

but I now think this is more likely a knitr problem.

knitr is involved only in code computation so I don't think there is anything here related to knitr as you are just writing text. Anyhow, I'll keep explaining.

Strangely, the .rnw code (...) produces (via knitr::knit2pdf("mydocument.rnw")) a PDF with all correct quotation marks, without these strange errors.

When you are writing a Rnw document, the text syntax you use is LaTeX. So writing LaTeX like ``abc" is ok. So it will work the same as in overleaf where you are expected to write LaTeX I believe.

The following code in a .rmd (or .qmd) document

When you are writing Text in Qmd or Rmd document, the syntax used is Markdown, and specifically Pandoc's Markdown. This syntax accepts some raw LaTeX or parses some Markdown as raw LaTeX if no doubt about it, and I think that is why you get the result.

First, Pandoc's Markdown is a syntax so that you can express content the same, no matter the output. There is an option called smart that will produce the ``abc'' for you by just writing double quotes. It is activated by default with R Markdown and Quarto.

So try

---
format: latex
---

"abc" (123) "def"

"abc"
"def"

"abc"

"def"

and you'll get in LaTeX

``abc'' (123) ``def''

``abc'' ``def''

``abc''

``def''

So you can use regular double quotes to have the fancy LaTeX syntax in the output


Now why do you observe the result you have, by inserting LaTeX directly inside the document. For that we need to look at how Pandoc parses the LaTeX.

This Markdown code

<!-- Problem -->
``abc" \(123\) ``def"

<!-- Problem -->
``abc"
``def"

<!-- No problem -->
``abc"

``def"

is parsed as this. If you want to understand deeply, this is Abstract Syntax Tree (AST) representation internal to Pandoc that will be explained in this doc - but i'll give you the detail below

[ RawBlock (Format "html") "<!-- Problem -->"
, Para
    [ Code ( "" , [] , [] ) "abc\" \\(123\\)" , Str "def\8221" ]
, RawBlock (Format "html") "<!-- Problem -->"
, Para [ Code ( "" , [] , [] ) "abc\"" , Str "def\8221" ]
, RawBlock (Format "html") "<!-- No problem -->"
, Para [ Str "``abc\8221" ]
, Para [ Str "``def\8221" ]
]

I hope this explains why you observe the results you have.

If you want to write raw LaTex, and do not want parsing you need to explicitly tells pandoc to no parse. See how to write raw content using raw attributes

like this

---
format: latex
---

<!-- Problem -->
``` ``abc" \(123\) ``def" ```{=latex}

<!-- Problem -->
```{=latex}
``abc"
``def"
which gives this latex

```latex
``abc" \(123\) ``def"

``abc"
``def"

Pandoc parses the markdown like this

[ RawBlock (Format "html") "<!-- Problem -->"
, Para
    [ RawInline (Format "latex") "``abc\" \\(123\\) ``def\"" ]
, RawBlock (Format "html") "<!-- Problem -->"
, RawBlock (Format "latex") "``abc\"\n``def\""
]

I hope this helps understand

timothee-bacri commented 1 year ago

@cderv Thank you very much for the detailed explanation and the solutions. I will likely need to come back here when I run into quotation mark problems again :)

github-actions[bot] commented 10 months ago

This old thread has been automatically locked. If you think you have found something related to this, please open a new issue by following the issue guide (https://yihui.org/issue/), and link to this old issue if necessary.