yihui / knitr

A general-purpose tool for dynamic report generation in R
https://yihui.org/knitr/
2.37k stars 874 forks source link

unicode \u2139 from dplyr causes spin output to fail with latex #2231

Closed ggrothendieck closed 1 year ago

ggrothendieck commented 1 year ago

Suppose we have file a.R. If we paste it into R it does result in a dplyr warning saying to use all_of(nms) instead of just nms but it runs and gives correct output.

library(dplyr)
nms <- names(BOD)
BOD %>% mutate(across(nms, scale))

Now suppose we run:

knitr::spin("a.R")
rmarkdown::render("a.md", "pdf_document")

This results in the following error

! LaTeX Error: Unicode character ℹ (U+2139)
               not set up for use with LaTeX.

Error: LaTeX failed to compile a.tex. See https://yihui.org/tinytex/r/#debugging for debugging tips. See a.log for more info.

The problem is that the a.R source code shown above causes dplyr to issue a warning and that warning message contains unicode \u2139 . MiKTeX, tinytex and texlive all gave the error shown above on Windows 10. The bottom line is that one cannot spin dplyr code that has such warnings and I think all dplyr warnings contain that character.

Note that there was no \u2139 in the a.R source code so this was pretty mysterious until I realized what was going on.

DavisVaughan commented 1 year ago

I actually saw a similar thing when running revdep checks on dplyr with the flexsurv package. It notably uses Rnw based vignettes, and when the dplyr warning is thrown when rendering the vignette it seems to use the unicode based i and the LaTeX output doesn't like that.

My personal notes about this were:

Maybe it is because the cli helper cli::is_latex_output() returns FALSE here but should really be returning TRUE? Which I think is because knitr::is_latex_output() is accidentally returning FALSE?

> knitr:::is_latex_output
function () 
{
    out_format("latex") || pandoc_to(c("latex", "beamer"))
}

knitr::is_latex_output() says it works for Rnw but maybe there is a bug

Here is the check output from when I ran that awhile back

Package: flexsurv
Check: re-building of vignette outputs
New result: WARNING
  Error(s) in re-building vignettes:
    ...
  --- re-building 'standsurv.Rmd' using rmarkdown
  --- finished re-building 'standsurv.Rmd'

  --- re-building 'flexsurv.Rnw' using knitr
  --- finished re-building 'flexsurv.Rnw'

  --- re-building 'multistate.Rnw' using knitr
  Error: processing vignette 'multistate.Rnw' failed with diagnostics:
  Running 'texi2dvi' on 'multistate.tex' failed.
  LaTeX errors:
  ! LaTeX Error: Unicode character ℹ (U+2139)
                 not set up for use with LaTeX.

  See the LaTeX manual or LaTeX Companion for explanation.
  Type  H <return>  for immediate help.
  ! Emergency stop.
   ...                                              

  l.1097 ℹ Please use `reframe()` instead.

  !  ==> Fatal error occurred, no output PDF file produced!
  --- failed re-building 'multistate.Rnw'

  --- re-building 'distributions.Rnw' using Sweave
  --- finished re-building 'distributions.Rnw'

  --- re-building 'flexsurv-examples.Rnw' using Sweave
  Loading required package: survival
  Forming integrated rmst function...
  Forming integrated mean function...
  Loading required package: TH.data
  Loading required package: MASS

  Attaching package: 'TH.data'

  The following object is masked from 'package:MASS':

      geyser

  --- finished re-building 'flexsurv-examples.Rnw'

  SUMMARY: processing the following file failed:
    'multistate.Rnw'

  Error: Vignette re-building failed.
  Execution halted
ggrothendieck commented 1 year ago

Apparently both MiKTeX and TeXLive includue the xetex engine whicih supports unicode. Don't know about tinytex. Is there some way to modify this R code to force the use of xetex?

knitr::spin("a.R")
rmarkdown::render("a.md", "pdf_document")
cderv commented 1 year ago

@ggrothendieck yes using xelatex is required for Unicode characters support.

You can pass argument to pdf_document() format in two ways.

either using a complete format that would override any set in YAML field in document

rmarkdown::render("a.md", rmarkdown::pdf_document(latex_engine = "xelatex")) 

or add some options to override or set options to default or in addition to any format specified in Rmd document

rmarkdown::render("a.md", "pdf_document", output_options = list(latex_engine = "xelatex"))

Also note that you don't need to call spin on its own. If you call render() on a .R file, it will do spinning for you.

rmarkdown::render("a.R", "pdf_document", output_options = list(latex_engine = "xelatex"))

@DavisVaughan

Which I think is because knitr::is_latex_output() is accidentally returning FALSE

Did you observed that or do you think there could be a problem in the knitr function ? I re-read the code and we set the internal option for out_format("latex") to be TRUE when rnw is used.

l.1097 ℹ Please use reframe() instead.

I see this in your log. This seems to be the same issue reported in https://github.com/yihui/knitr/issues/2234 which I believe is cli still outputing some ANSI character in knitr output. Maybe related to https://github.com/r-lib/cli/issues/581

Is there still issue with flexsurv that we can run to reproduce and see if this is other than what I mentioned just above ?

ggrothendieck commented 1 year ago

@cderv, Thanks! I tried it with TeXLive on Windows and it worked great. Also the tip about giving the .R file straight to render is really handy.

DavisVaughan commented 1 year ago

@cderv I don't think I reproduced it locally, that was from the revdepcheck result.

I imagine that you can probably reproduce it locally by forking it locally with:

usethis::create_from_github("chjackson/flexsurv-dev", "~/Desktop/r/playground/packages/")

and then running this git command to checkout the commit before the flexsurv author made the necessary changes to fix it

git checkout d369ce5bb41308384046bb20f45b4ff7a2f89ebe -b "testing"

and then running a devtools::check() with dplyr 1.1.0 installed.

I tried but got other errors like ! LaTeX Error: File xcolor.sty' not found. so I couldn't render the whole thing, but maybe you know how to get past that.

That commit corresponds to https://github.com/chjackson/flexsurv-dev/commit/d369ce5bb41308384046bb20f45b4ff7a2f89ebe which was right before these 2 commits which look to be targeted at fixing the UTF-8 issues:

cderv commented 1 year ago

Great thank you I'll have a look

cderv commented 1 year ago

@ggrothendieck can you try with dev cli package as it could have solved this issue also ( maybe with https://github.com/r-lib/cli/issues/581) ? Thank you !

ggrothendieck commented 1 year ago

@cderv, It is not clear to me what you are suggesting. How do I modify the code in my first post in this thread?

cderv commented 1 year ago

@ggrothendieck you just need to install development version of cli (pak::pak("r-lib/cli") or remotes::install_github("r-lib/cli"))

Then dplyr should use this new version of cli in any context, including inside R Markdown document. No need to change anything to your code.

ggrothendieck commented 1 year ago

install_github failed with non-zero exit status. Will try it once it is released to CRAN.

yihui commented 1 year ago

@ggrothendieck Perhaps try install.packages("cli", repos = "https://r-lib.r-universe.dev")? r-universe.dev provides binaries for dev versions of packages.

ggrothendieck commented 1 year ago

The installation from r-universe worked but rendering the code did not.

> knitr::spin("a.R")

processing file: a.Rmd

  |                                                          
  |                                                    |   0%
  |                                                          
  |.................                                   |  33%                  
  |                                                          
  |...................................                 |  67% (unnamed-chunk-1)
  |                                                          
  |....................................................| 100%                  

output file: a.md

> rmarkdown::render("a.md", "pdf_document")
"C:/PROGRA~3/CHOCOL~1/bin/pandoc" +RTS -K512m -RTS a.md --to latex --from markdown+autolink_bare_uris+tex_math_single_backslash --output a.tex --lua-filter "C:\Users\Louis\AppData\Local\R\win-library\4.2\rmarkdown\rmarkdown\lua\pagebreak.lua" --lua-filter "C:\Users\Louis\AppData\Local\R\win-library\4.2\rmarkdown\rmarkdown\lua\latex-div.lua" --embed-resources --standalone --highlight-style tango --pdf-engine pdflatex --variable graphics --variable "geometry:margin=1in" 
! LaTeX Error: Unicode character ℹ (U+2139)
               not set up for use with LaTeX.

Error: LaTeX failed to compile a.tex. See https://yihui.org/tinytex/r/#debugging for debugging tips. See a.log for more info.

As previously in this thread I am using TeXLive on Windows and the following does work.

rmarkdown::render("a.md", "pdf_document", output_options = list(latex_engine = "xelatex"))
cderv commented 1 year ago

Thanks I'll have a closer look about the difference with spin() and render().

In current situation, dplyr shows Unicode in message which requires xelatex or lualatex for PDF. Not so much an issue, just a config. We could try set an option based on latex_engine

I need to understand what the tidyverse stack is already doing.

Thanks

cderv commented 1 year ago

So following what Davis shared also above, cli should not use unicode when knitr::is_latex_output() is false. https://github.com/r-lib/cli/blob/79119446955972eaadb07397764c8c039ef6e0c5/R/utf8.R#L13-L20

is_utf8_output <- function() {
  opt <- getOption("cli.unicode", NULL)
  if (! is.null(opt)) {
    isTRUE(opt)
  } else {
    l10n_info()$`UTF-8` && !is_latex_output()
  }
}

Unicode should not be used when LaTeX output is detected.

This explains why this works

rmarkdown::render("a.R", "pdf_document")

because the output is indeed LaTeX when R code from a.R is evaluated in the intenral spin()

But when this is run first

knitr::spin("a.R")

the output is .md - Following the cli detection above, if UTF8 platform is used, then unicode will be used (as this is not LaTeX output)

So if you run spin() operation on its own, you need to set the cli.unicode option to FALSE so that ANSI is used.

withr::with_options(
    list(cli.unicode = FALSE),
    knitr::spin("a.R")
)
rmarkdown::render("a.md", "pdf_document")

I don't think knitr can do much more than that. The .md resulting from spin() can contains unicode, this is what is done with it that does not support it. Moreover, render("a.R", "pdf_document") works as expected.

DavisVaughan commented 1 year ago

@cderv, I guess something similar must happen when the flexsurv Rnw vignette is rendered? Like, it probably converts to some intermediate md first? So it looks like unicode is available? And then that is further converted to LaTeX, but the unicode is already in there by that point.

I do feel that since knitr seems to control that whole process of Rnw->md->LaTeX (assuming that is right), then knitr could still be in charge of ensuring that the intermediate result doesn't have unicode in it (since it knows it is going to be converted to LaTeX eventually)

cderv commented 1 year ago

@DavisVaughan yes for Rnw to LaTeX I agree if this is indeed mixing. Different issue that this one here which spin() + render() though.

I'll look into this. Thanks for the input !

cderv commented 1 year ago

So the issue is specific to flexsurv vignette multistate.Rnw

In the setup chunk, they are using render_sweave()

This has a side effect of modifying the out.format value in knitr https://github.com/yihui/knitr/blob/db4eafb3e05939c6d8e558cf66665a4669ee0bbc/R/hooks-latex.R#L346

which means is_latex_output() will return FALSE because it only detect latex https://github.com/yihui/knitr/blob/db4eafb3e05939c6d8e558cf66665a4669ee0bbc/R/utils.R#L373-L375

I don't know if render_sweave() is supposed to be used in .Rnw vignette, or if this is quite specific to flexsurve.

@yihui should we add sweave out.format inside is_latex_output() ? Or is it not there for some reason ? Don't know much about Sweave, that is why I am asking.

knitr::is_latex_output() says it works for Rnw but maybe there is a bug

I can confirm to you @DavisVaughan that knitting .Rnw will correctly set the out.format to be latex and is_latex_output() will be TRUE. That happens in https://github.com/yihui/knitr/blob/db4eafb3e05939c6d8e558cf66665a4669ee0bbc/R/output.R#L223-L226

So dplyr message should output ok in usual Rnw file

yihui commented 1 year ago

@yihui should we add sweave out.format inside is_latex_output() ?

Yes, and done. Thanks!

cderv commented 1 year ago

Awesome !

Thanks @DavisVaughan and @ggrothendieck for the report about all this specific behavior

github-actions[bot] commented 10 months ago

This old thread has been automatically locked. If you think you have found something related to this, please open a new issue by following the issue guide (https://yihui.org/issue/), and link to this old issue if necessary.