quarto-dev / quarto-r

R interface to quarto-cli
https://quarto-dev.github.io/quarto-r/
146 stars 24 forks source link

quarto_render changes the class of vectors containing NA values #168

Open debdagybra opened 8 months ago

debdagybra commented 8 months ago

quarto_render() changes NA of numeric vectors to character ".na.real"

Similar to https://github.com/quarto-dev/quarto-r/issues/124(https://github.com/quarto-dev/quarto-r/issues/124)


library(quarto)
quarto_render("test.qmd",
              execute_params = list(
                test_vec = c(1, NA, 2.5, -6.33, NaN, Inf)
              ))

Content of "test.qmd"

---
title: "test"
format: html
params:
  test_vec: "test_vec"
---

### With params:  
class: `r class(params$test_vec)`  
values: `r params$test_vec`  
is.na: `r is.na(params$test_vec)`  

```{r setup, include = FALSE}
test_vec <- c(1, NA, 2.5, -6.33, NaN, Inf)

Created within qmd file:

class: r class(test_vec)
values: r test_vec
is.na: r is.na(test_vec)



Returns:

### With params:

class: character  
values: 1, .na.real, 2.5, -6.33, NA, NA  
is.na: FALSE, FALSE, FALSE, FALSE, TRUE, TRUE  

### Created within qmd file:  

class: numeric  
values: 1, NA, 2.5, -6.33, NaN,  
is.na: FALSE, TRUE, FALSE, FALSE, TRUE, FALSE  

<br>  

This behaviour occurs also to columns of dataframes and tibbles.

<br>  

Windows 11
rstudio version: 2023.12.1+402
Quarto 1.5.26
[>] Checking versions of quarto binary dependencies...
      Pandoc version 3.1.11: OK
      Dart Sass version 1.70.0: OK
      Deno version 1.41.0: OK
[>] Checking versions of quarto dependencies......OK
[>] Checking Quarto installation......OK
      Version: 1.5.26
      Path: C:\Users\xxxxxx\AppData\Local\Programs\Quarto\bin
      CodePage: 1252

[>] Checking tools....................OK
      TinyTeX: (not installed)
      Chromium: (not installed)

[>] Checking LaTeX....................OK
      Tex:  (not detected)

[>] Checking basic markdown render....OK

[>] Checking Python 3 installation....(None)
      Unable to locate an installed version of Python 3.
      Install Python 3 from https://www.python.org/downloads/

[>] Checking R installation...........OK
      Version: 4.3.3
      Path: C:/PROGRA~1/R/R-43~1.3
      LibPaths:
        - C:/Users/xxxxxx/AppData/Local/R/win-library/4.3
        - C:/Program Files/R/R-4.3.3/library
      knitr: 1.45
      rmarkdown: 2.26

[>] Checking Knitr engine render......OK
debdagybra commented 8 months ago

Same with

cderv commented 8 months ago

Indeed. Thanks for the report.

We are using yaml R package for writing the R objects, and they do use those special values because YAML spec does not have something for NA https://github.com/vubiostat/r-yaml/blob/81f8903232bf125853901f62cdff3934b96eb1a5/inst/CHANGELOG#L112-L124

What would you expect a NA value in R to be in YAML ?

I don't think R NA can be represented in YAML without loosing information;

We could either add a handler for converting to NULL but this could cause issue probably to forcibly coerce. NULL and NA are not the same in R.

To be conservative, we could error when we detect any NA in the conversion, asking to check the execute_params object. Using --execute-params CLI flag to quarto render you would not be able to add NA.

Curious of your thought on this.

debdagybra commented 8 months ago

I don't have enough knowlegde about YAML to answer your question about the NA's.

I don't think you should return an error when any value in execute_params is NA. It's quite restrictive and since it's allowed in basic R to have vectors containing NA, its should be possible to use them with quarto.

But the original object class and values should be preserved when called with params$, shouldn't they ? It's odd to put a numeric vector in execute_params and get a character vector in the .qmd file.

With rmarkdown, the vector is well preserved.

library(rmarkdown)
rmarkdown::render("test.qmd",
                  params = list(
                test_vec = c(1, NA, 2.5, -6.33, NaN, Inf)
              ))

returns:

With params:

class: numeric values: 1, NA, 2.5, -6.33, NaN, is.na: FALSE, TRUE, FALSE, FALSE, TRUE, FALSE

Created within qmd file:

class: numeric values: 1, NA, 2.5, -6.33, NaN, is.na: FALSE, TRUE, FALSE, FALSE, TRUE, FALSE


And in quarto itself, for logical vectors, the class is preserved correctly.

library(quarto)
quarto_render("test.qmd",
              execute_params = list(
                test_vec = c(TRUE, NA, FALSE)
              ))

returns :

With params:

class: logical values: TRUE, NA, FALSE is.na: FALSE, TRUE, FALSE

Created within qmd file:

class: logical values: TRUE, NA, FALSE is.na: FALSE, TRUE, FALSE

cderv commented 8 months ago

Let me reintroduce the context here.

quarto_render() is wrapper around quarto render for which one of the flag --execute-params which can take a YAML file to defined parameter. Doc is here https://quarto.org/docs/computations/parameters.html#rendering

quarto render document.qmd --execute-params params.yml

This means that for quarto, the only way to pass parameters is to use YAML syntax. And YAML syntax does not know about R objects.

Now, quarto_render() R function is a wrapper as I said, and instead of just asking for a YAML file to be provided as argument, a R list of object can be passed and the R function will take care of writing the YAML to pass to Quarto.

This is where the big difference here is with rmarkdown::render() where parameters are directly processed in R because rmarkdown is directly running R. Quarto is not.

So for example you can't pass a dataframe, or any other R specific object directly to quarto render because you would not be able to provide this as a YAML value. And so you cannot either in quarto_render() because there is no conversion in YAML spec for such object.

NA and its family is among those objects - there is no 1-1 representation in YAML. So when I asked the question "what value would you expect", this means :

If you were to use CLI with quarto render and not calling from R, how would you set up your params? You would not be able to pass some values.

So that is why I am thinking of an error if unsupported values are passed to execute_params because they are just not supported in quarto. Unfortunately, this is a limitation and you can't pass specific R objects.

For more example, this has been discussed also at

I won't close this here though because we indeed need to do something (prevent rendering or do force coercion to NULL ?) to avoid this .na.real problem.

Maybe in the future we'll find a solution in Quarto to have an API for yaml params that handles computation language specifics

debdagybra commented 8 months ago

Thanks for the explanation and the new epic :)

I still don't understand why it's working with logical vectors.

According to the link you sent in your first message, if I understand well, the vector c(TRUE, NA, FALSE), should be converted to character c("TRUE", ".na", "FALSE") ? But it doesn"t, instead we get c(TRUE, NA, FALSE).

Can't we do the same with numeric and character vectors ?

So for example you can't pass a dataframe, or any other R specific object directly to quarto render because you would not be able to provide this as a YAML value. And so you cannot either in quarto_render() because there is no conversion in YAML spec for such object.

When I pass a dataframe or a tibble, I get also a df or tibble with params$, so I guess that quarto_render has done some magic to pass the class and/or attributes to the yaml ?

Maybe I'm naive but can we also pass the class of vectors in order to convert them back later ?

cderv commented 8 months ago

According to the link you sent in your first message, if I understand well, the vector c(TRUE, NA, FALSE), should be converted to character c("TRUE", ".na", "FALSE") ? But it doesn"t, instead we get c(TRUE, NA, FALSE).

Oh that is interesting ! Thanks for pointing this out!

This is an issue from trying to solve #124 with https://github.com/quarto-dev/quarto-r/commit/5207b6c1bfe2e3dca12d45644227945c5312abf3 https://github.com/quarto-dev/quarto-r/blob/ba8485a53fac80256d3301519455d3411c2be7a2/R/utils.R#L6-L16

The handler for logical doesn't not handle NA specifically, and so if it encounters NA logical, it will use NA as verbatim instead of the .na which is what yaml::as.yaml() would have output.

It seems it does not cause issues for a .qmd file using engine: knitr, but it will for one using engine: jupyter

When I pass a dataframe or a tibble, I get also a df or tibble with params$, so I guess that quarto_render has done some magic to pass the class and/or attributes to the yaml ?

Can you share an example of this please ?

Maybe I'm naive but can we also pass the class of vectors in order to convert them back later ?

This is not as simple right now. Quarto is a tool to work with any computations engine, so anything done as a built-in feature must be working for R Python Julia and maybe other in the future. Hence also the EPIC as the parameter feature is not yet at that level.

Here we are:

So c(TRUE, NA, FALSE) really became internally [ true, 'NA', false ] in quarto which is wrong really, but seems to work (by chance) with knitr engine because this is read as jsonlite::parse_json(..., simplifyVector = TRUE) which does the coercion from "NA" as string to NA as logical value

str(jsonlite::parse_json('[true, "NA", false]', simplifyVector = TRUE))
#>  logi [1:3] TRUE NA FALSE

This is indeed R specific here. Python does not have NA equivalent I think.

Take this .Qmd file

---
title: "test"
format: html
---

```{python}
#| tags: [parameters]
#| echo: true

test_vec = "test_vec"
test_vec

If you render with your example, 
````r
library(quarto)
quarto_render("index.qmd",
              execute_params = list(
                  test_vec = c(TRUE, NA, FALSE)
              ))

The NA is a string image

I got into some details, but I hope this illustrate why this is not as simple.

In R Markdown, rmarkdown::render(params = ) runs in R and pass the params as is without conversion to the rendering knitting processing.

So this explain the current limitation and why this require some more design (https://github.com/quarto-dev/quarto-cli/issues/9197) if we want to support an API for parameter that could allow engine specific consideration.

debdagybra commented 8 months ago

When I pass a dataframe or a tibble, I get also a df or tibble with params$, so I guess that quarto_render has done some magic to pass the class and/or attributes to the yaml ?

Can you share an example of this please ?

I was wrong, the dataframes and tibbles are converted to lists.
I thought they were preserved because the functions from dplyr were still working.

debdagybra commented 8 months ago

By the way, the workaround you suggested here https://forum.posit.co/t/param-converted-from-data-frame-to-list/155556/8 with RDS files works very well ! Thanks!

debdagybra commented 8 months ago

Until it's resolved, maybe you can add a warning in quarto_render() to notify the user that some data are modified (when NA or when dataframe, ...) and guide them to the workaround with RDS file. A warning could save them a lot of time.

cderv commented 8 months ago

Thanks for the feedback. I'll make it more apparent in the doc, and I'll probably throw an error for those specific R values that can't be translated. IMO, they shouldn't be used in execute_params at all.