Closed aalexandersson closed 7 years ago
Fixed typo in last sentence: Changed wickham to hadley.
Thanks. Could you please post the .Rmd
file you use for testing on Windows, just to make sure we're on the same page?
knitr::opts_chunk$set(echo = TRUE)
This is some system information.
sessionInfo()
library(readr)
rmarkdown::pandoc_version()
This creates some tibble output.
read_csv("auto.txt")
This is a test of tibble using pandoc. How to run pandoc from R? In Stata's Rcall command it is automated. To reproduce the error in R, I copy-paste the above output to Notepad, which defaults to Encoding ANSI. I save the filename as "output.txt". Then from the command prompt where Pandoc is installed, I typed
pandoc Markdown.txt -o Word.docx
I saved the error message as "error_message.png".
The same problem by another user was also reported on Stack Exchange at http://stackoverflow.com/questions/26492750/using-imported-utf-8-character-in-knitr-with-r
Here is the error message:
I am not allowed to paste HTML output. For you to see my R output, I attach the PDF output. test_tibble.pdf
I can confirm this bug. The 'x' is the culprit. Here is a short Rnw with reproducible example:
\documentclass{article}
\usepackage[utf8]{inputenc}
\begin{document}
This will generate an error when compiling the \texttt{tex}.
<<test>>=
library(tibble)
as_tibble(cars)
@
The error I get on linux, and some colleagues on Mac, is:
\begin{verbatim}
> knit2pdf("test.Rnw")
processing file: test.Rnw
|...................... | 33%
ordinary text without R code
|........................................... | 67%
label: test
|.................................................................| 100%
ordinary text without R code
output file: test.tex
Error in texi2dvi(file = file, pdf = TRUE, clean = clean, quiet = quiet, :
Running 'texi2dvi' on 'test.tex' failed.
LaTeX errors:
! Package inputenc Error: Unicode char \u8:× not set up for use with LaTeX.
See the inputenc package documentation for explanation.
Type H <return> for immediate help.
...
\end{verbatim}
\end{document}
On Windows 10, R 3.3.3, rmarkdown 1.4, tibble 1.3.0.9000 I am unable to reproduce this with either Rmd or Rnw. However, if I use rmarkdown::render("file", clean = FALSE)
and use the non-UTF8 Md file of the two generated, I can get pandoc to produce the error indicated. There doesn't, however, seem to be anything wrong as such with the code in tibble
.
@yihui: Is there a way to determine the expected encoding for console output for a knitr or rmarkdown run? Or do we just assume UTF-8?
tibble is printing a multiplication sign which requires Unicode and seems to break knitr documents in some cases.
The weird thing is that my system is using utf8, and other non-ascii characters seem to do just fine. In the example provided the encoding is declared when loading the inputenc package in the LaTeX header (\usepackage[utf8]{inputenc}
).
I received a similar report recently about the multiplication sign: https://github.com/yihui/knitr/issues/1389 but I could not reproduce it on Windows.
I guess @thibautjombart's problem is that he didn't tell knitr the encoding was supposed to be UTF-8 (which is the default on *nix but not Windows): knit2pdf("test.Rnw", encoding = "UTF-8")
.
I'd recommend that you just use the letter x
instead of the fancy Unicode character... Character encoding problems on Windows are forever pain.
@hadley: Okay to revert to plain ASCII x
?
@yihui nope, my native encoding is utf-8 (I'm on linux). Adding the option hasn't changed the error. I can reproduce the error on the current rocker/verse docker image too:
File toto.Rnw saved
root@0aee4758d237:~# R
R version 3.3.3 (2017-03-06) -- "Another Canoe"
Copyright (C) 2017 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)
R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.
R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.
Type 'demo()' for some demos, 'help()' for on-line help, or
'help.start()' for an HTML browser interface to help.
Type 'q()' to quit R.
> knitr::knit2pdf("toto.Rnw")
processing file: toto.Rnw
|...................... | 33%
ordinary text without R code
|........................................... | 67%
label: test
|.................................................................| 100%
ordinary text without R code
output file: toto.tex
Error in texi2dvi(file = file, pdf = TRUE, clean = clean, quiet = quiet, :
Running 'texi2dvi' on 'toto.tex' failed.
LaTeX errors:
! Package inputenc Error: Unicode char \u8:× not set up for use with LaTeX.
See the inputenc package documentation for explanation.
Type H <return> for immediate help.
...
>
Also note this character is used in the print
method for tibble
object. I am not using it otherwise.
For what it's worth, this is what emacs thinks of this character:
position: 1 of 2 (0%), column: 0
character: × (displayed as ×) (codepoint 215, #o327, #xd7)
preferred charset: unicode (Unicode (ISO10646))
code point in charset: 0xD7
script: latin
syntax: _ which means: symbol
category: .:Base, c:Chinese, h:Korean, j:Japanese, l:Latin
to input: type "C-x 8 RET HEX-CODEPOINT" or "C-x 8 RET NAME"
buffer code: #xC3 #x97
file code: #xC3 #x97 (encoded by coding system utf-8-unix)
display: by this font (glyph code)
xft:-PfEd-DejaVu Sans Mono-normal-normal-normal-*-19-*-*-*-m-0-iso10646-1 (#x99)
Seems like a valid utf8 character to my (naive) eye..
@krlmlr yeah, it's not worth the hassle.
This old thread has been automatically locked. If you think you have found something related to this, please open a new issue and link to this old issue if necessary.
The tibble multiplication sign is an invalid UTF-8 character. Here is a typical example output from http://readr.tidyverse.org/reference/read_delim.html :
> # A tibble: 32 × 11
The multiplication sign character in read_csv outputs such as above is extended ASCII but it should be either in plain ASCII or in Unicode UTF-8. In UTF-8 encoding, the character is displayed as xD7 but pandoc gives the error message
"Cannot decode byte '\xd7': Data.Text.Internal.Encoding.Fusion.streamUtf8: Invalid UTF-8 stream"
This is a problem for pandoc on Windows only. I tried pandoc version 1.13.1 and 1.18. I mentioned the problem on Statalist and wondered if it was a problem with Stata's user-written program "Markdoc", which is Stata's equivalent program to R Markdown. The user-programmer of MarkDoc concluded that read_csv should have avoided the invalid UTF-8 character, and I agree. The Statalist URL is http://www.statalist.org/forums/forum/general-stata-discussion/general/1355554-markdoc-manual-gui?p=1362612#post1362612
What is the rationale for using extended ASCII instead of plain ASCII or UTF-8 for the tibble multiplication sign? Given (1) the compatibility problems with pandoc on Windows and with dependent programs such as Stata's markdoc, (2) the no need for extended ASCII, and (3) having an obvious easy fix, I assume this issue was simply overlooked. The problem occurs with R's read_csv () but in bug tidyverse/readr#547 hadley closed the bug and instead suggested this is a tibble problem.