rstudio / bookdown

Authoring Books and Technical Documents with R Markdown
https://pkgs.rstudio.com/bookdown/
GNU General Public License v3.0
3.75k stars 1.27k forks source link

Knitr, pandoc or bookdown wrongly interpret \@ref(...) inside inline, block and fenced code blocks for docx and odt output #811

Open N0rbert opened 4 years ago

N0rbert commented 4 years ago

The code below knits correctly to the HTML:

---
site: bookdown::bookdown_site
output: 
  bookdown::html_document2: default
  bookdown::word_document2: default
  bookdown::odt_document2: default
---

# Problem with references as code blocks {-}

This line contains code example with reference to `Fig. \@ref(fig:myfig)`.

This line contains code example with reference to `Table \@ref(tab:mytab)`.

After this line there are two references as code (4 spaces before):

    see Fig. \@ref(fig:myfig)
    see Table \@ref(tab:mytab)

Below is a list with references as inline code:

* figure - `see Fig. \@ref(fig:myfig)`
* table - `see Table \@ref(tab:mytab)`

Below is a list with multi-paragraph items with inline code:

* figure

  `see Fig. \@ref(fig:myfig)`

* table 

  `see Table \@ref(tab:mytab)`

Below is a list with multi-paragraph and code blocks inside them:

* figure

      see Fig. \@ref(fig:myfig)

* table

      see Table \@ref(tab:mytab)

as shown on the image below:

correct-html

but DOCX and ODT outputs are wrong (they contain HTML code with ??):

My version info:

> xfun::session_info('bookdown')
R version 3.4.4 (2018-03-15)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 16.04.6 LTS, RStudio 1.2.5019

Locale:
  LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C               LC_TIME=ru_RU.UTF-8       
  LC_COLLATE=en_US.UTF-8     LC_MONETARY=ru_RU.UTF-8    LC_MESSAGES=en_US.UTF-8   
  LC_PAPER=ru_RU.UTF-8       LC_NAME=C                  LC_ADDRESS=C              
  LC_TELEPHONE=C             LC_MEASUREMENT=ru_RU.UTF-8 LC_IDENTIFICATION=C       

Package version:
  base64enc_0.1.3 bookdown_0.14.1 digest_0.6.22   evaluate_0.14   glue_1.3.1      graphics_3.4.4 
  grDevices_3.4.4 highr_0.8       htmltools_0.4.0 jsonlite_1.6    knitr_1.25      magrittr_1.5   
  markdown_1.1    methods_3.4.4   mime_0.7        Rcpp_1.0.3      rlang_0.4.1     rmarkdown_1.16 
  stats_3.4.4     stringi_1.4.3   stringr_1.4.0   tinytex_0.17    tools_3.4.4     utils_3.4.4    
  xfun_0.10       yaml_2.2.0     

and I get this problem with bundled and latest pandoc 2.7.3-1 and even with remotes::install_github('yihui/knitr').

Could please elaborate why this happening? Should we move this bug to Knitr or Pandoc issue trackers?


By filing an issue to this repo, I promise that

I understand that my issue may be closed if I don't fulfill my promises.

N0rbert commented 4 years ago

EPUB (bookdown::epub_book: default) is affected too:

epub-wrong

And PDF (bookdown::pdf_book: default) book too (but looks different - have two \\ref{..} instead of \@(...)):

wrong-pdf

cderv commented 4 years ago

The issue you report is kind of specific because you did not have any figure or table to reference.

You should see a warning about that. Something like

Warning messages:
1: The label(s) fig:myfig not found 
2: The label(s) tab:mytab not found 
3: The label(s) fig:myfig not found 
4: The label(s) tab:mytab not found 
5: The label(s) fig:myfig not found 
6: The label(s) tab:mytab not found 
7: The label(s) fig:myfig not found 
8: The label(s) tab:mytab not found 
9: The label(s) fig:myfig not found 
10: The label(s) tab:mytab not found 

However, maybe the output should be different in case of no reference found... 🤔

Can you clarify something: Why do you think the output for html is correct ? There is no tables and figures to reference either in your HTML examples ?

Can you make an example using a table and a figure to reference too ? This should work as expected by replacing the reference by the correct number.

N0rbert commented 4 years ago

I see these warnings, but this should not happen.

I'm writing a text with example code inside it - so I expect that `see Fig. \@ref(fig:myfig)` or `see Table \@ref(tab:mytab)` should not be interpreted and should be shown as-is (in-line or fenced code block depending on syntax used). From this PoV the HTML output is absolutely correct and expected. But DOCX and ODT are wrong!

So I do not refer to existing table and/or figure, I'm providing example of the reference in code form. Like in Y. Xie book - with theorems and `\@ref(prefix:label)`.

cderv commented 4 years ago

With the context, it is way clearer. thank you very much for the precisions. I am sorry if I misunderstood you. To be sure I am correct, do you expect that when put \@ref anywhere inside inline code block, bookdown should not parse them as reference to be replaced ?

Currently, bookdown is parsing the whole file and finding Fig. \@ref(fig:name) and Table. \@ref(tab:name) to be replaced, then replace it. But if you write

Fig. `\@ref(fig:myfig)`

or

Table `\@ref(tab:mytab)`

then it is not replaced - That is what is done in the bookdown book and why it works. Bookdown currently seems to only skip \@ref when put directly after a backtick.

In your example that would mean

This line contains code example with reference to Fig. `\@ref(fig:myfig)`.

This line contains code example with reference to Table `\@ref(tab:mytab)`.

If we change the detection method, the question I see here: Should we assume that if the Fig. \@ref or Table. \@ref code are put in some code blocks, then it all case we don't want to replace them by their reference ? I think so, but not sure if it could break some use case...

Anyway, not a trivial case to deal with, 🤔 Playing with regex and making exception is not a simple task. Bookdown is currently made to parse and get correct references for any \@ref occurrence.

N0rbert commented 4 years ago

Thanks for your analysis!

I have already using workaround for inline code using Fig. `\@ref(fig:myfig)` and `Table `\@ref(tab:mytab)`. But I do not know how to apply the fix for fenced code (or nested) blocks. And I wonder why it works normally on HTML format, but does not work in other formats. Could you please elaborate or dig into the problem deeper?

cderv commented 4 years ago

For HTML, the processing is different as support for references is done by processing the html output directly. For other format like word and odt, references support in bookdown is added by processing the md input so that conversion by pandoc keeps the references and it works. For pdf, it is the tex file that is processed before conversion to pdf.

The syntax is different for each format, so the logic too, hence the difference in treatment currently. What you want is not supported I guess in current version, some adaptations are required to ignore reference label inside code blocks.

I will look into it when I find time. You can also dig yourself and open a PR if you find a way to deal with this. Do not hesitate, it is always welcome !

cderv commented 3 years ago

Some notes on this topic:

Maybe using a Lua filter to implement this referencing feature would allow to avoid such differences between format because we could do the change directly in the Pandoc AST.

Currently, to improve this for formats powered by bookdown:::process_markdown(), this would require changing the logic here: https://github.com/rstudio/bookdown/blob/3695a759770507a3a045fb7968f8d5b288e9e8dc/R/ebook.R#L85-L96

Currently, figure labels are found by creating a temp HTML files from the md files, and then making the replacement in the markdown input directly, which causes the above issue.

atusy commented 3 years ago

FYI: I am interested in contributing, and already have a prototype filter but not well tested.

---
output: 
  word_document:
    number_sections: true
    pandoc_args:
      - "--lua-filter"
      - "crossref.lua"
---

```{r, include=FALSE}
download.file("https://raw.githubusercontent.com/atusy/lua-filters/master/lua/crossref.lua", "crossref.lua")

section {#section}

yihui commented 3 years ago

@atusy @cderv I don't know how much work it would be to replace my dirty hacks in R with a Lua filter, but I just want to say that I'm all for doing that!

That said, eventually we want to switch the syntax from \@ref(fig:label) + (\#fig:label) to @fig:label + {#fig:label} (https://github.com/quarto-dev/quarto-cli/tree/main/src/resources/filters/crossref), so I'm not sure if it'll be worth the effort now to develop the Lua filter for the former. I'll let you decide.

cderv commented 3 years ago

I'll look into it. @atusy seems to have a working prototype so maybe it just lack thorough testing. We'll see about that.