hadley / r4ds

R for data science: a book
http://r4ds.hadley.nz
Other
4.51k stars 4.19k forks source link

'14.6.1 Encoding' example code returns errors in R 4.3.2 #1599

Closed TheOtherWJ closed 9 months ago

TheOtherWJ commented 10 months ago

In 14.6.2, where the issue of encoding non-English characters, the example code does not return the expected results.

The example code is as follows:

x1 <- "text\nEl Ni\xf1o was particularly bad this year"
read_csv(x1)$text
#> [1] "El Ni\xf1o was particularly bad this year"

x2 <- "text\n\x82\xb1\x82\xf1\x82\xc9\x82\xbf\x82\xcd"
read_csv(x2)$text
#> [1] "\x82\xb1\x82\xf1\x82ɂ\xbf\x82\xcd"

read_csv(x1, locale = locale(encoding = "Latin1"))$text
#> [1] "El Niño was particularly bad this year"

read_csv(x2, locale = locale(encoding = "Shift-JIS"))$text
#> [1] "こんにちは"

When I run the same code with R 4.3.2, I get the following errors.

library(tidyverse)

x1 <- "text\nEl Ni\xf1o was particularly bad this year"
read_csv(x1)$text
#> Warning in grepl("\n", path): unable to translate 'text
#> El Ni<f1>o was particularly bad this year' to a wide string
#> Warning in grepl("\n", path): input string 1 is invalid
#> Warning in grepl("^((http|ftp)s?|sftp)://", path): unable to translate 'text
#> El Ni<f1>o was particularly bad this year' to a wide string
#> Warning in grepl("^((http|ftp)s?|sftp)://", path): input string 1 is invalid
#> Error in basename(path): file name conversion problem -- name too long?

x2 <- "text\n\x82\xb1\x82\xf1\x82\xc9\x82\xbf\x82\xcd"
read_csv(x2)$text
#> Warning in grepl("\n", path): unable to translate 'text
#> <82><b1><82><f1><82>ɂ<bf><82><cd>' to a wide string
#> Warning in grepl("\n", path): input string 1 is invalid
#> Warning in grepl("^((http|ftp)s?|sftp)://", path): unable to translate 'text
#> <82><b1><82><f1><82>ɂ<bf><82><cd>' to a wide string
#> Warning in grepl("^((http|ftp)s?|sftp)://", path): input string 1 is invalid
#> Error in basename(path): file name conversion problem -- name too long?

read_csv(x1, locale = locale(encoding = "Latin1"))$text
#> Warning in grepl("\n", path): unable to translate 'text
#> El Ni<f1>o was particularly bad this year' to a wide string
#> Warning in grepl("\n", path): input string 1 is invalid
#> Warning in grepl("^((http|ftp)s?|sftp)://", path): unable to translate 'text
#> El Ni<f1>o was particularly bad this year' to a wide string
#> Warning in grepl("^((http|ftp)s?|sftp)://", path): input string 1 is invalid
#> Error in basename(path): file name conversion problem -- name too long?

read_csv(x2, locale = locale(encoding = "Shift-JIS"))$text
#> Warning in grepl("\n", path): unable to translate 'text
#> <82><b1><82><f1><82>ɂ<bf><82><cd>' to a wide string
#> Warning in grepl("\n", path): input string 1 is invalid
#> Warning in grepl("^((http|ftp)s?|sftp)://", path): unable to translate 'text
#> <82><b1><82><f1><82>ɂ<bf><82><cd>' to a wide string
#> Warning in grepl("^((http|ftp)s?|sftp)://", path): input string 1 is invalid

Created on 2023-11-15 with reprex v2.0.2

jonathannathanaus commented 9 months ago

This has been addressed in the readr Github; https://github.com/tidyverse/readr/issues/1521

TheOtherWJ commented 9 months ago

This has been addressed in the readr Github; tidyverse/readr#1521

thanks for letting me know.