quarto-dev / quarto-cli

Open-source scientific and technical publishing system built on Pandoc.
https://quarto.org
Other
3.96k stars 326 forks source link

Clarify message about `prefer-html` when HTML content is produced in non-HTML output #8717

Open Mkranj opened 9 months ago

Mkranj commented 9 months ago

Bug description

I'm writing a document that includes a kableExtra table, which is a HTML document. When rendering the document to .docx, I get the usual warning:

Error: Functions that produce HTML output found in document targeting docx output.
Please change the output type of this document to HTML.
If your aiming to have some HTML widgets shown in non-HTML format as a screenshot,
please install webshot or webshot2 R package for knitr to do the screenshot.
Alternatively, you can allow HTML output in non-HTML formats
by adding this option to the YAML front-matter of
your quarto file:

  prefer-html: true

Note however that the HTML output will not be visible in non-HTML formats.

However, I have both webshot and webshot2 installed, along with PhantomJS and a chrome browser. knitr:::webshot_available() returns:

webshot2  webshot 
    TRUE     TRUE

And furthermore, when I render the same file to PDF, it works with no issues. I believe the same warning should have appeared in this case too, if it was really about the webshot libraries.

Steps to reproduce

---
title: My new document
format:
  docx: default
---

```{r}
library(dplyr)
library(knitr)
library(kableExtra)

kable(mtcars) %>% kableExtra::kable_classic_2()


### Expected behavior

The file should be rendered to .docx sucessfully, with an image of the mtcars table.

### Actual behavior

Rendering fails even though webshot and webshot2 are installed. Also, the render works for .pdf files.

### Your environment

IDE: RStudio 2023.09.1, OS: Windows 11.
The issue persists when using CLI Quarto, so it shouldn't be related to RStudio!

### Quarto check output

Quarto 1.4.549
[>] Checking versions of quarto binary dependencies...
      Pandoc version 3.1.11: OK
      Dart Sass version 1.69.5: OK
      Deno version 1.37.2: OK
[>] Checking versions of quarto dependencies......OK
[>] Checking Quarto installation......OK
      Version: 1.4.549
      Path: C:\Users\Administrator\AppData\Local\Programs\Quarto\bin
      CodePage: 1250

[>] Checking tools....................OK
      TinyTeX: v2024.02
      Chromium: 869685

[>] Checking LaTeX....................OK
      Using: TinyTex
      Path: C:\Users\Administrator\AppData\Roaming\TinyTeX\bin\windows\
      Version: 2023

[>] Checking basic markdown render....OK

[>] Checking Python 3 installation....(None)

      Unable to locate an installed version of Python 3.
      Install Python 3 from https://www.python.org/downloads/

[>] Checking R installation...........OK
      Version: 4.2.2
      Path: C:/PROGRA~1/R/R-42~1.2
      LibPaths:
        - C:/RPackages
        - C:/Program Files/R/R-4.2.2/library
      knitr: 1.45
      rmarkdown: 2.22

[>] Checking Knitr engine render......OK
cscheid commented 9 months ago

Rendering fails even though webshot and webshot2 are installed.

Why do you expect quarto to automatically use webshot or webshot2? That's not a feature that we advertise.

Mkranj commented 9 months ago

I've focused on this part of the error message: "please install webshot or webshot2 R package for knitr to do the screenshot." And I cannot find more information about configuring either knitr or the YAML in Quarto so that these packages will work when rendering to .docx.
If the issue is more relevant for knitr or another repo, I'll close it here.

cscheid commented 9 months ago

No need to close! I'm just gathering information, and wanted to make sure it wasn't an issue directly on our documentation. That message appears to be coming from knitr directly (it's not in Quarto's source code) but let me investigate more.

cscheid commented 9 months ago

Something is definitely wonky here. I don't know how to make knitr disable that message, but we should try to figure that out.

Not only do I get that same message, but prefer-html: true doesn't actually appear to have any effect on knitr. Similarly, I installed webshot2 and the error message persists.

@cderv do you know what might be happening here?

cderv commented 9 months ago

(@Mkranj, I edited you issue so that the .qmd example is readable. )

Let me explain what is happening here, and the context also.

TL;DR

Issue happens here because kableExtra R package only output HTML or LaTeX tables. With format: docx, it will produce HTML tables with some HTML dependencies. knitr sees that has incompatible and so throw an error. Quarto only pass the message.

The automatic screenshot feature applies only to HTML widgets which is not the case here.

More details below


About the error message

That message appears to be coming from knitr directly (it's not in Quarto's source code) but let me investigate more.

@cscheid yes the error is thrown from rmarkdown::render() when it detects some HTML dependencies to be included but the output targeted is not HTML format.

For now in Quarto we do pass the message to the user without more processing that just changing the name of the option (as Quarto does use prefer-html and R Markdown uses always_allow_html

Usually this happens when HTML widgets (https://www.htmlwidgets.org/) are used like plotly, leaflet, DT. This is where webshot or webshot2 comes into play.

Why do you expect quarto to automatically use webshot or webshot2? That's not a feature that we advertise.

This is a knitr feature that we indirectly support when engine: knitr is used. knitr will by default make a screenshot of the HTML widgets results using one of the package and include the results as an image.

This is a very early R Markdown feature mentioned at the time in the R Markdown definitive guide: https://bookdown.org/yihui/rmarkdown/interactive-documents.html#intro-widgets

This works for HTML widgets, not all HTML results that would have HTML dependencies. And this is where the issue is here.

About the issue itself

Error: Functions that produce HTML output found in document targeting docx output

What is reported in the error message is the main problem. You are using knitr::kable() and a kableExtra style by using this code

kable(mtcars) %>% kableExtra::kable_classic_2()

By using kableExtra, as documented by this R package, you will output either HTML or LaTeX. It won't output Markdown table.

when I render the same file to PDF, it works with no issues. I believe the same warning should have appeared in this case too, if it was really about the webshot libraries.

The default will be a HTML table, but when format: pdf is used, kabelExtra will know to output a LaTeX table.

So that is why you see your example working in PDF, and not for DOCX output, and the warning not being thrown in PDF

If you're aiming to have some HTML widgets shown in non-HTML format as a screenshot, please install webshot or webshot2 R package for knitr to do the screenshot.

This part is about the HTML widgets I talked about above, and kableExtra won't produce HTML widgets, just some HTML code and some HTML dependencies to be included, which doesn't work for Docx output, obviously.

@Mkranj If you don't use kableExtra the problem goes away. knitr::kable() will produce Markdown Tables which can be converted to Docx tables, but you can't use a specific style like kableExtra::kable_classic_2(). For styling tables in format: docx using knitr engine, R has some dedicated packages like gt or flextable, but kableExtra is not one of them. IMO, they should error when you try to use it in a non-supported format.

Overall, you can't use kableExtra::kable_classic_2() feature for a Docx output.

Regarding prefer-html options

but prefer-html: true doesn't actually appear to have any effect on knitr.

prefer-html was introduced in Quarto for our Hugo format (which support HTML output in a .md file) . It was linked to always_allow_html feature in R Markdown but without any further adjustment (except option name).

We may need to adapt more in some specific cases to clarify it. Ultimately, this is the same issue in R Markdown, but if we want to be more clever in Quarto, we could adapt rmarkdown or Quarto R code itself.

@cscheid Happy to discuss it live with you maybe, that will be easier.

This was long, but I hope this helps clarify.

cscheid commented 9 months ago

I'll definitely want to discuss this live 😭

cderv commented 9 months ago

I was betting on it... πŸ˜… That is why I suggested it. Let's discuss the next steps then! πŸ˜‰

Mkranj commented 9 months ago

Thank you for the detailed writeup. Not using kableExtra isn't the outcome I was hoping for, but now I have a better understanding of what's going on. I'll check out other packages you mentioned.
Just as a throwaway idea, could a LaTeX table be rendered properly in docx? So that somehow making kableExtra output that format would make it visible in the output document?

cderv commented 9 months ago

Just as a throwaway idea, could a LaTeX table be rendered properly in docx?

Quarto is based on Pandoc features and extension mechanism for most of its conversion features. Markdown input is parsed in an abstract form (the AST), and then this normalized representation is converted to an output format using Pandoc's writers.

No LaTeX table parser is available, and it would probably be hard to make one, considering all the different forms of LaTeX tables. And even if that would be possible, I do think the layout and style you would get in LaTeX (e.g. using kableExtra::kable_classic_2()) would be lost anyhow as Word document would have different features.

Markdown is the main syntax for cross-format output in Pandoc, and it exists different table syntax variations that would all be converted to DOCX output.

Quarto adds support for parsing HTML tables to represent them internally by a table (and not just raw HTML). It enables output as a Table in any format. The specific styling may be lost in some formats though (as HTML could be more featureful than what Pandoc's writer can do for some formats), but this is a way to have HTML table output be supported cross-format. This is not working here because knitr needs to be aware of this Quarto feature. This is something we'll discuss and improve.

So that somehow making kableExtra output that format would make it visible in the output document?

This is not a direct Quarto problem. When using the R package to create a table for use with a document generation tool (e.g., R Markdown, Quarto, knitr, and others that may exist), the output produced by the function needs to be considered. Outputting a raw LaTeX table has limitation when you don't target PDF only. kableExtra features are made to extend knitr::kable() for HTML and PDF output. They can't be used for other formats (e.g docx, typst, asciidoc, ...)

R has several table packages (we list some at https://bookdown.org/yihui/rmarkdown-cookbook/tables.html) that have different features targeted to different types of output (e.g., interactive table for HTML outputs, LaTeX-only table for PDF, and openxml output for Docx document).

Usually, those tools have features targeted toward the output format (gt or flextable will have a nice styling feature that will work in docx output as they write raw openxml directly).

Tools like gt will allow you to have one R code that will work for different outputs (as it supports HTML, LATEX, and DOCX). Otherwise, using conditional evaluation ([R Markdown example to adapt(https://bookdown.org/yihui/rmarkdown-cookbook/latex-html.html)) or conditional insertion of content (Quarto feature will allow you to have different chunks for different outputs in the same document.

Hope it helps

Mkranj commented 9 months ago

Thank you once more!

mcanouil commented 9 months ago

Side (advanced) note: Word document uses OpenXML. It's possible to use raw OpenXML in Quarto, i.e., {=openxml}.