quarto-dev / quarto-cli

Open-source scientific and technical publishing system built on Pandoc.
https://quarto.org
Other
3.93k stars 325 forks source link

ggplot output in "asis" R code blocks is not included in Powerpoint output #3683

Closed Xitian9 closed 1 year ago

Xitian9 commented 1 year ago

Bug description

Steps to reproduce:

  1. Render the following file.
    
    ---
    title: "MWE"
    format: pptx
    ---
library(ggplot2)

print(ggplot(mtcars, aes(x = wt)) + geom_histogram())


Expected output: A Powerpoint document with a title slide and a histogram on the second slide.
Observed output: A Powerpoint document with only a title slide and no histogram.

Note that we get the correct result if we remove `results="asis"`, or if we change format to `html`.

Running on RStudio 2022.07.2 Build 576 within Windows 10.

### Checklist

- [X] Please include a minimal, fully reproducible example in a single .qmd file? Please provide the whole file rather than the snippet you believe is causing the issue.
- [X] Please [format your issue](https://quarto.org/bug-reports.html#formatting-make-githubs-markdown-work-for-us) so it is easier for us to read the bug report.
- [X] Please document the RStudio IDE version you're running (if applicable), by providing the value displayed in the "About RStudio" main menu dialog?
- [X] Please document the operating system you're running. If on Linux, please provide the specific distribution.
cderv commented 1 year ago

@Xitian9 Why are you trying to use results = "asis" here ? What is the purpose ?

I would like to understand your usage, as I believe this is not needed here. Let me explain.

Note that we get the correct result if we remove results="asis"

This is what you use if you want to have the image inserted in PPTX file. You don't need the print() also.

---
title: "MWE"
format: pptx
---

```{r}
library(ggplot2)

ggplot(mtcars, aes(x = wt)) + geom_histogram()

`asis` is used to include a result string as raw markdown in the output usually. See example in the [R Markdown Cookbook][https://bookdown.org/yihui/rmarkdown-cookbook/results-asis.html)

With plots, it will already be markdown that will be output and you don't need to mark it `asis`. Something like 


**knitr** will also deal with the special printing required for being included as output in the the intermediary markdown document and so that is why you don't need `print()`

I agree that using `results = "asis"` should probably be ignored here in case of a plot output, and it should probably have worked. I'll look into that. But still, using `results = "asis"` on plot chunk is not needed usually. Unless I am missing some usage. 
That is why I am asking for details. 
cderv commented 1 year ago

@cscheid we got a different result here because when results = 'asis' is used, we get this intermediate markdown file

::: {.cell}
::: {.cell-output-display}
![](test2_files/figure-pptx/unnamed-chunk-1-1.png)
:::
:::

We are missing the .cell div (probably because of results = "asis" which seems expected. However, for HTML this does not seem to have any consequences, but for PPTX output, it does. I am trying to look what processing we do for those divs exactly that would impact PPTX.

Should we handle that better ? It feels we should, even if as explained above, it should not really be used that way. But it could also be Pandoc behavior directly - I'll try to confirm.

I'll keep looking anyway but feel free to share idea if you have one.

Xitian9 commented 1 year ago

Yes, as you point out results="asis" is not needed here, nor is the print statement; that's because I pared down my use case to the minimal (non-)working example. In my real usage I'm generating slides in a loop with different titles, in which case both are needed.

---
title: "MWE-2"
format: pptx
---

```{r results="asis"}
library(ggplot2)
library(purrr)

displaySlide <- function(var) {
  cat("##", var, "\n\n")
  print(ggplot(mtcars, aes(x = !!sym(var))) + geom_histogram())
  cat("\n\n")
}

walk(c("wt", "cyl"), displaySlide)
Xitian9 commented 1 year ago

I don't think it's an issue with pandoc, as the following code works with straight Rmarkdown, not using quarto.

---
title: "MWE-3"
output: powerpoint_presentation
---

```{r results="asis"}
library(ggplot2)
library(purrr)

displaySlide <- function(var) {
  cat("##", var, "\n\n")
  print(ggplot(mtcars, aes(x = !!sym(var))) + geom_histogram())
  cat("\n\n")
}

walk(c("wt", "cyl"), displaySlide)
cscheid commented 1 year ago
::: {.cell-output-display}
![](test2_files/figure-pptx/unnamed-chunk-1-1.png)
:::

We are missing the .cell div (probably because of results = "asis" which seems expected. However, for HTML this does not seem to have any consequences, but for PPTX output, it does. I am trying to look what processing we do for those divs exactly that would impact PPTX.

I know very very little about how we emit PPTX, but I have seen other cases where the lack of a .cell div hurt our processing.

One of the things that @dragonstyle and I discussed this week is that once the AST work is merged (which should happen later tonight 🤞) we'll be able to start annotating the quarto document more precisely. One of the things I would like to do is to create a CellOutput node, which would then be treated uniformly (so we stop worrying about surrounding divs, etc).

cderv commented 1 year ago

I don't think it's an issue with pandoc, as the following code works with straight Rmarkdown, not using quarto.

@Xitian9, as I tried to explain in my previous answer, it works in rmarkdown because the print() and results = 'asis' are kind of ignored. Quarto does use a special markdown syntax which rmarkdown does not (the one shown above in previous answer), and this is what creates the issue with Pandoc not processing the div as in HTML.

In my real usage I'm generating slides in a loop with different titles, in which case both are needed.

When doing loop to create mixed content, especially with plots or HTML widgets for example, you should rely on knitr::knit_child(). There is several examples on issues if you search for this. We added an example for Quarto in https://examples.quarto.pub/create-tabsets-panel-from-r-code/ with gt tables and tabsets.

Using knit_child() will insure that markdown text and evaluated R code in the loop are correctly knitted together. This is because knitr does not rely on print() but on knit_print() methods. Relying on cat() and asis is not safe with all content - it works well with string content generated programatically. With plots, it can have side effects depending on the package used.

The R Markdown Cookbook shows examples too

I believe that should avoid the issue seen here in Quarto.

Hope it helps

One of the things I would like to do is to create a CellOutput node, which would then be treated uniformly (so we stop worrying about surrounding divs, etc).

@cscheid Awesome ! A big topic I need to catch up on.

cderv commented 1 year ago

Should we handle that better ? It feels we should, even if as explained above, it should not really be used that way. But it could also be Pandoc behavior directly - I'll try to confirm.

It seems this is a Pandoc issue somehow. Adding a div around an image will ignore the whole content in output

---
title: "MWE"
---

## Image is removed  

::: {.dummy}
![](https://pandoc.org/pandoc-cartoon.svgz)
:::

---

## Also when something else

::: {.dummy}
**strong** content

![](https://pandoc.org/pandoc-cartoon.svgz)
:::

image

So it seems like an issue with Pandoc itself.

We can probably come around this in Quarto for the specific case of ::: {.cell-output-display} but could not be as simple as this. I'll look closer

cderv commented 1 year ago

It is a known issue and we already do a processing in https://github.com/quarto-dev/quarto-cli/blob/main/src/resources/filters/quarto-pre/outputs.lua

I'll make some adaptation