ropensci / git2rdata

An R package for storing and retrieving data.frames in git repositories.
https://ropensci.github.io/git2rdata/
GNU General Public License v3.0
99 stars 12 forks source link

problem with special characters when using write_vc and read_vc() #34

Closed ToonHub closed 5 years ago

ToonHub commented 5 years ago

When writing a dataframe that contains a factor with special characters (such as ë, ï, etc...), write_vc() results in a crash of R. See example below.

library(git2rdata)
thisworks <- data.frame(Nr = 1:10,
                   Name = "Geeiod")
write_vc(thisworks, "testfile1")
thisworks_read <- read_vc("testfile1")

thisdoesnotwork <- data.frame(Nr = 1:10,
                   Name = "Geëiö")
write_vc(thisdoesnotwork, "testfile4") 

When the factor is converted to characters write_vc() does work but read_vc() does not read the special characters properly. See example below.

thisworks <- data.frame(Nr = 1:10,
                   Naam = "Geëiö") %>%
  mutate(Naam = as.character(Naam))
write_vc(thisworks, "testfile20") 
test_lees <- read_vc("testfile20")
test_lees
#    Nr        Naam
# 1   1 Ge<eb>i<f6>
# 2   2 Ge<eb>i<f6>
# 3   3 Ge<eb>i<f6>
# 4   4 Ge<eb>i<f6>
# 5   5 Ge<eb>i<f6>
# 6   6 Ge<eb>i<f6>
# 7   7 Ge<eb>i<f6>
# 8   8 Ge<eb>i<f6>
# 9   9 Ge<eb>i<f6>
# 10 10 Ge<eb>i<f6>

Apparently this problem does not occur when using a Linux os.

Session Info ```r > devtools::session_info() - Session info ------------------------------------------------------------------------------------- setting value version R version 3.5.2 (2018-12-20) os Windows 7 x64 SP 1 system x86_64, mingw32 ui RStudio language (EN) collate Dutch_Belgium.1252 ctype Dutch_Belgium.1252 tz Europe/Paris date 2019-05-08 - Packages ----------------------------------------------------------------------------------------- package * version date lib source assertthat 0.2.1 2019-03-21 [1] CRAN (R 3.5.3) backports 1.1.4 2019-04-10 [1] CRAN (R 3.5.3) bookdown 0.9 2018-12-21 [1] CRAN (R 3.5.3) callr 3.2.0 2019-03-15 [1] CRAN (R 3.5.3) cli 1.1.0 2019-03-19 [1] CRAN (R 3.5.3) crayon 1.3.4 2017-09-16 [1] CRAN (R 3.5.1) desc 1.2.0 2018-05-01 [1] CRAN (R 3.5.1) devtools 2.0.2 2019-04-08 [1] CRAN (R 3.5.3) digest 0.6.18 2018-10-10 [1] CRAN (R 3.5.3) drat 0.1.5 2019-03-28 [1] CRAN (R 3.5.3) evaluate 0.13 2019-02-12 [1] CRAN (R 3.5.3) fortunes 1.5-4 2016-12-29 [1] CRAN (R 3.5.0) fs 1.3.1 2019-05-06 [1] CRAN (R 3.5.3) git2r 0.25.2 2019-03-19 [1] CRAN (R 3.5.3) git2rdata * 0.0.3 2019-04-29 [1] Github (inbo/git2rdata@4b1f2fd) glue 1.3.1 2019-03-12 [1] CRAN (R 3.5.3) htmltools 0.3.6 2017-04-28 [1] CRAN (R 3.5.3) knitr 1.22 2019-03-08 [1] CRAN (R 3.5.3) magrittr 1.5 2014-11-22 [1] CRAN (R 3.5.1) memoise 1.1.0 2017-04-21 [1] CRAN (R 3.5.1) pkgbuild 1.0.3 2019-03-20 [1] CRAN (R 3.5.3) pkgload 1.0.2 2018-10-29 [1] CRAN (R 3.5.3) prettyunits 1.0.2 2015-07-13 [1] CRAN (R 3.5.1) processx 3.3.0 2019-03-10 [1] CRAN (R 3.5.3) ps 1.3.0 2018-12-21 [1] CRAN (R 3.5.3) R6 2.4.0 2019-02-14 [1] CRAN (R 3.5.3) Rcpp 1.0.1 2019-03-17 [1] CRAN (R 3.5.3) remotes 2.0.4 2019-04-10 [1] CRAN (R 3.5.3) rlang 0.3.4 2019-04-07 [1] CRAN (R 3.5.3) rmarkdown 1.12 2019-03-14 [1] CRAN (R 3.5.3) rprojroot 1.3-2 2018-01-03 [1] CRAN (R 3.5.1) rstudioapi 0.10 2019-03-19 [1] CRAN (R 3.5.3) sessioninfo 1.1.1 2018-11-05 [1] CRAN (R 3.5.3) testthat 2.1.1 2019-04-23 [1] CRAN (R 3.5.3) usethis 1.5.0 2019-04-07 [1] CRAN (R 3.5.3) withr 2.1.2 2018-03-15 [1] CRAN (R 3.5.1) xfun 0.6 2019-04-02 [1] CRAN (R 3.5.3) yaml 2.2.0 2018-07-25 [1] CRAN (R 3.5.1) ```
ThierryO commented 5 years ago

This should be solved in a95bd098. Please checkout the review branch (#33) to test it.

ToonHub commented 5 years ago

It works fine with characters but not with factors

library(git2rdata)
library(tidyverse)

#OK
test <- data.frame(Nr = 1:10,
                   Name = c(
                     rep("ZGeëiö",5),
                     rep("KlöïM", 5)
                   )) %>%
  mutate(Name = as.character(Name))

write_vc(test, "testfile11", sorting = c("Name", "Nr"))
test_read <- read_vc("testfile11")

#Not OK
test <- data.frame(Nr = 1:10,
                   Name = c(
                     rep("ZGeëiö",5),
                     rep("KlöïM", 5)
                   ))

write_vc(test, "testfile11", sorting = c("Name", "Nr"))

# Error in as.yaml(m) : 
#   Emitter error: expected SCALAR, SEQUENCE-START, MAPPING-START, or ALIAS
ThierryO commented 5 years ago

I think I found and fixed the bug. Can you check the new version?

ToonHub commented 5 years ago

OK. It works fine now.