rstudio / rmarkdown

Dynamic Documents for R
https://rmarkdown.rstudio.com
GNU General Public License v3.0
2.87k stars 972 forks source link

df_print and cached chunks #2310

Open netique opened 2 years ago

netique commented 2 years ago

Hi,

this gave me a few head-scratching moments: When I use html_document that has been knitted and the results cached (I mean knitr::opts_chunk(cache = TRUE)), then when I decide to show paged tables using

output:
  html_document:
    df_print: paged

in YAML header, the result keeps rendering as verbatim text output (forgive me the {shiny} lingo).

Now I regard this as obvious, but it is in fact the second time already that I have been solving this "issue". I believe it could be hard for {knitr} and {rmarkdown} to resolve the df_print with a cached output since the usage of methods and classes that are inherent to the very cached output, but maybe it is worth documenting this behavior or raising some friendly warning. What do you think?

Session info ``` R version 4.1.2 (2021-11-01) Platform: x86_64-w64-mingw32/x64 (64-bit) Running under: Windows 10 x64 (build 22000), RStudio 2021.9.1.372 Locale: LC_COLLATE=Czech_Czechia.1250 LC_CTYPE=Czech_Czechia.1250 LC_MONETARY=Czech_Czechia.1250 LC_NUMERIC=C LC_TIME=Czech_Czechia.1250 Package version: base64enc_0.1.3 digest_0.6.29 evaluate_0.14 fastmap_1.1.0 glue_1.6.0 graphics_4.1.2 grDevices_4.1.2 highr_0.9 htmltools_0.5.2 jquerylib_0.1.4 jsonlite_1.7.2 knitr_1.37 magrittr_2.0.1 methods_4.1.2 rlang_0.4.12 rmarkdown_2.11 stats_4.1.2 stringi_1.7.6 stringr_1.4.0 tinytex_0.36 tools_4.1.2 utils_4.1.2 xfun_0.29 yaml_2.2.1 Pandoc version: 2.14.0.3 ```

Checklist

When filing a bug report, please check the boxes below to confirm that you have provided us with the information we need. Have you:

cderv commented 2 years ago

Thanksfor the suggestion.

We have some documentation and generic advices in the R Markdown Cookbook https://bookdown.org/yihui/rmarkdown-cookbook/cache.html

Among them:

The most appropriate use case of caching is to save and reload R objects that take too long to compute in a code chunk, and the code does not have any side effects, such as changing global R options via options() (such changes will not be cached). If a code chunk has side effects, we recommend that you do not cache it.

We do not recommend that you set the chunk option cache = TRUE globally in a document. Caching can be fairly tricky. Instead, we recommend that you enable caching only on individual code chunks that are surely time-consuming and do not have side effects.

Following this documentation, a Rmd that process data, and prints a table should be that way

---
title: "test"
output:
  html_document:
    df_print: paged
---

```{r, message=FALSE, warning=FALSE}
library(dplyr)

Le'ts get the droids name and their homeworld

droids <- starwars %>% filter(species == "Droid") %>% select(name, homeworld) %>% distinct()
droids


Meaning that the table rendering / printing should not be in a cache chunk. That way the printing method (which is a side effect somehow) will correctly apply. 

We could document specifically for `df_print`, but really this will be the case with any external generic config (here changing `df_print` YAML) that should apply on the output of a cached chunk. Caching means the chunk is not recomputed and result is loaded - changing an external config won't invalid the cache, unless it is explicitly set in `cache.extra` option; 

Anyway, I just wanted to clarify. I'll mark this as doc improvment - thanks for the suggestion !