kable() defaulting to pandoc format

maelle commented 4 years ago

When I use knitr::kable() in a hugodown post, in the index.md I get a pandoc-formatted table, not a markdown-formatted table. I'm not sure why. :-)

I looked into how kable() defines its default format https://github.com/yihui/knitr/blob/683887b3169104592f3dbabb457e41aaee2af71c/R/table.R#L91-L104, the only switch seems to be a global option.

I see there's https://github.com/r-lib/hugodown/blob/51d365b5b4ad4b7146120dde323ab742d606c97e/R/hugo-format.R#L126 but it doesn't seem to be used? Or am I missing something?

hadley commented 4 years ago

What do you mean by pandoc vs markdown formatted table?

cderv commented 4 years ago

This was something that made me want to look into it so it decided to dig a bit into it.

I think what you observe it not as simple as described. Here is the step of my investigation

First thing I tried was to render a very small document with hugo_document format

---
title: "test"
output: hugodown::hugo_document
---

```{r}
knitr::kable(head(mtcars, 2))


to render with `rmarkdown::render("test.Rmd", clean = FALSE)` so that I get the intermediary knitted files.

The pandoc command line use is this one. 

```sh
"C:/Users/chris/scoop/apps/rstudio/current/bin/pandoc/pandoc" +RTS -K512m -RTS test.utf8.md --to markdown_strict+pipe_tables+strikeout+autolink_bare_uris+task_lists+backtick_code_blocks+definition_lists+footnotes+smart+tex_math_dollars --from markdown+autolink_bare_uris+tex_math_single_backslash --output test.md --wrap=none

If I open the intermediary file (test.utf8.md, the one resulting of knit::knitr step), I can see this table

                 mpg   cyl   disp    hp   drat      wt    qsec   vs   am   gear   carb
--------------  ----  ----  -----  ----  -----  ------  ------  ---  ---  -----  -----
Mazda RX4         21     6    160   110    3.9   2.620   16.46    0    1      4      4
Mazda RX4 Wag     21     6    160   110    3.9   2.875   17.02    0    1      4      4

If I open the resulting file test.md, I see a markdown table looking like this

                 mpg   cyl   disp    hp   drat      wt    qsec   vs   am   gear   carb
--------------  ----  ----  -----  ----  -----  ------  ------  ---  ---  -----  -----
Mazda RX4         21     6    160   110    3.9   2.620   16.46    0    1      4      4
Mazda RX4 Wag     21     6    160   110    3.9   2.875   17.02    0    1      4      4

so not converted, whereas the command line as the correct extensions

I wanted to test if the post processing could be the cause, so I used the pandoc command line directly inside the folder where the test.Rmd and the intermediary files are run the previous command line.

You'll get a document (test.md) with this table format (NOTE: it will erase the previous document)

|               |  mpg|  cyl|  disp|   hp|  drat|     wt|   qsec|   vs|   am|  gear|  carb|
|---------------|----:|----:|-----:|----:|-----:|------:|------:|----:|----:|-----:|-----:|
| Mazda RX4     |   21|    6|   160|  110|   3.9|  2.620|  16.46|    0|    1|     4|     4|
| Mazda RX4 Wag |   21|    6|   160|  110|   3.9|  2.875|  17.02|    0|    1|     4|     4|

It is the correct one. The issue seems to come from the post_processor in hugo_document.

When I look at it, I think the issue comes from here https://github.com/r-lib/hugodown/blob/51d365b5b4ad4b7146120dde323ab742d606c97e/R/hugo-format.R#L88-L92

Currently, the input_file is read by brio::read_line, modified and written back by brio::write_line into the output_file. In rmarkdown / knitr ecosystem, at the preprocessor step, input_file is the one passed as input to pandoc, and output_file the one resulting from pandoc conversion. It allows the post processor function to get access to both to do some postprocessing.

Here, it means that the file before pandoc conversion is read, modified and its body is written back into the file after pandoc conversion. This is why the pipe_tables format is not there and we get the pandoc format from the input file, because the whole output file body is replaced by the input file one. The body should be the one from the output file from pandoc, right ?

If I am right, I am wondering why this has not caused any more weird rendering error. 🤔

cderv commented 4 years ago

As the aim is to preserve the yaml and adding some content in it, I think it should be

take the yaml from the input_file and complete it
write it at the beginning of the pandoc output file

 meta <- yaml::as.yaml(yaml)
 body <- brio::read_lines(output_file)

 output_lines <- c("---", meta, "---", "", body) 
 brio::write_lines(output_lines, output_file)

This is like in rmardown::md_document post processor

hadley commented 4 years ago

Doh, thanks for the investigation!!

hadley commented 4 years ago

@cderv btw it's not too surprising this wasn't causing major issues since it you'd only see problems where goldmark and pandoc disagree on formatting. And they both start from commonmark (which covers the most common syntax), so you'd only expect to see weirdness with more exotic syntax.

maelle commented 4 years ago

Wow, fantastic digging @cderv!

r-lib / hugodown

kable() defaulting to pandoc format #19