Open eliocamp opened 4 years ago
Please specify whether your issue is about:
This is shown in the readme, but is it intended behaviour for factor variables to be converted to text in a roundtrip?
library(csvy) csvy::write_csvy(iris, "iris.csvy") all.equal(iris, csvy::read_csvy("iris.csvy")) #> [1] "Attributes: < Names: 1 string mismatch >" #> [2] "Attributes: < Length mismatch: comparison on first 2 components >" #> [3] "Attributes: < Component 2: Modes: numeric, character >" #> [4] "Attributes: < Component 2: Lengths: 150, 1 >" #> [5] "Attributes: < Component 2: target is numeric, current is character >" #> [6] "Component \"Species\": 'current' is not a factor" str(iris) #> 'data.frame': 150 obs. of 5 variables: #> $ Sepal.Length: num 5.1 4.9 4.7 4.6 5 5.4 4.6 5 4.4 4.9 ... #> $ Sepal.Width : num 3.5 3 3.2 3.1 3.6 3.9 3.4 3.4 2.9 3.1 ... #> $ Petal.Length: num 1.4 1.4 1.3 1.5 1.4 1.7 1.4 1.5 1.4 1.5 ... #> $ Petal.Width : num 0.2 0.2 0.2 0.2 0.2 0.4 0.3 0.2 0.2 0.1 ... #> $ Species : Factor w/ 3 levels "setosa","versicolor",..: 1 1 1 1 1 1 1 1 1 1 ... str(csvy::read_csvy("iris.csvy")) #> 'data.frame': 150 obs. of 5 variables: #> $ Sepal.Length: num 5.1 4.9 4.7 4.6 5 5.4 4.6 5 4.4 4.9 ... #> $ Sepal.Width : num 3.5 3 3.2 3.1 3.6 3.9 3.4 3.4 2.9 3.1 ... #> $ Petal.Length: num 1.4 1.4 1.3 1.5 1.4 1.7 1.4 1.5 1.4 1.5 ... #> $ Petal.Width : num 0.2 0.2 0.2 0.2 0.2 0.4 0.3 0.2 0.2 0.1 ... #> $ Species : chr "setosa" "setosa" "setosa" "setosa" ... #> ..- attr(*, "levels")= chr "setosa" "versicolor" "virginica" #> - attr(*, "profile")= chr "tabular-data-package" #> - attr(*, "name")= chr "iris"
The relevant yaml section has this:
#- name: Species # type: string # levels: # - setosa # - versicolor # - virginica
Which is not really correct, as the type is actually factor. I'm not at all familiar with the csvy spec so "factor" might not be a posible type. In any case, this applies not only to factors. It seems that write/read_csvy drops class attributes.
type
factor
For example,
library(csvy) iris$col <- 1 class(iris$col) <- c("custom_cass", "numeric") csvy::write_csvy(iris, "iris.csvy") str(iris) #> 'data.frame': 150 obs. of 6 variables: #> $ Sepal.Length: num 5.1 4.9 4.7 4.6 5 5.4 4.6 5 4.4 4.9 ... #> $ Sepal.Width : num 3.5 3 3.2 3.1 3.6 3.9 3.4 3.4 2.9 3.1 ... #> $ Petal.Length: num 1.4 1.4 1.3 1.5 1.4 1.7 1.4 1.5 1.4 1.5 ... #> $ Petal.Width : num 0.2 0.2 0.2 0.2 0.2 0.4 0.3 0.2 0.2 0.1 ... #> $ Species : Factor w/ 3 levels "setosa","versicolor",..: 1 1 1 1 1 1 1 1 1 1 ... #> $ col : 'custom_cass' num 1 1 1 1 1 1 1 1 1 1 ... str(csvy::read_csvy("iris.csvy")) #> 'data.frame': 150 obs. of 6 variables: #> $ Sepal.Length: num 5.1 4.9 4.7 4.6 5 5.4 4.6 5 4.4 4.9 ... #> $ Sepal.Width : num 3.5 3 3.2 3.1 3.6 3.9 3.4 3.4 2.9 3.1 ... #> $ Petal.Length: num 1.4 1.4 1.3 1.5 1.4 1.7 1.4 1.5 1.4 1.5 ... #> $ Petal.Width : num 0.2 0.2 0.2 0.2 0.2 0.4 0.3 0.2 0.2 0.1 ... #> $ Species : chr "setosa" "setosa" "setosa" "setosa" ... #> ..- attr(*, "levels")= chr "setosa" "versicolor" "virginica" #> $ col : chr "1" "1" "1" "1" ... #> - attr(*, "profile")= chr "tabular-data-package" #> - attr(*, "name")= chr "iris"
Please specify whether your issue is about:
This is shown in the readme, but is it intended behaviour for factor variables to be converted to text in a roundtrip?
Session info
``` r devtools::session_info() #> ─ Session info ────────────────────────────────────────────────────────── #> setting value #> version R version 3.6.2 (2019-12-12) #> os elementary OS 5.1 Hera #> system x86_64, linux-gnu #> ui X11 #> language en_US #> collate en_US.UTF-8 #> ctype en_US.UTF-8 #> tz America/Argentina/Buenos_Aires #> date 2019-12-16 #> #> ─ Packages ────────────────────────────────────────────────────────────── #> package * version date lib source #> assertthat 0.2.1 2019-03-21 [1] CRAN (R 3.6.0) #> backports 1.1.5 2019-10-02 [1] CRAN (R 3.6.1) #> callr 3.3.2 2019-09-22 [1] CRAN (R 3.6.1) #> cli 1.1.0 2019-03-19 [1] CRAN (R 3.6.0) #> crayon 1.3.4 2017-09-16 [1] CRAN (R 3.6.0) #> csvy * 0.3.0 2019-12-16 [1] Github (leeper/csvy@af0aa8d) #> data.table 1.12.6 2019-10-18 [1] CRAN (R 3.6.1) #> desc 1.2.0 2018-05-01 [1] CRAN (R 3.6.0) #> devtools 2.2.0.9000 2019-09-17 [1] Github (r-lib/devtools@2765fbe) #> digest 0.6.23 2019-11-23 [1] CRAN (R 3.6.1) #> ellipsis 0.3.0 2019-09-20 [1] CRAN (R 3.6.1) #> evaluate 0.14 2019-05-28 [1] CRAN (R 3.6.0) #> fs 1.3.1 2019-05-06 [1] CRAN (R 3.6.1) #> glue 1.3.1.9000 2019-09-17 [1] Github (tidyverse/glue@71eeddf) #> highr 0.8 2019-03-20 [1] CRAN (R 3.6.0) #> htmltools 0.4.0 2019-10-04 [1] CRAN (R 3.6.1) #> jsonlite 1.6 2018-12-07 [1] CRAN (R 3.6.0) #> knitr 1.25 2019-09-18 [1] CRAN (R 3.6.1) #> magrittr 1.5 2014-11-22 [1] CRAN (R 3.6.0) #> memoise 1.1.0 2017-04-21 [1] CRAN (R 3.6.0) #> pkgbuild 1.0.6 2019-10-09 [1] CRAN (R 3.6.1) #> pkgload 1.0.2 2018-10-29 [1] CRAN (R 3.6.0) #> prettyunits 1.0.2 2015-07-13 [1] CRAN (R 3.6.0) #> processx 3.4.1 2019-07-18 [1] CRAN (R 3.6.1) #> ps 1.3.0 2018-12-21 [1] CRAN (R 3.6.0) #> R6 2.4.1 2019-11-12 [1] CRAN (R 3.6.1) #> Rcpp 1.0.3 2019-11-08 [1] CRAN (R 3.6.1) #> remotes 2.1.0 2019-06-24 [1] CRAN (R 3.6.1) #> rlang 0.4.1.9000 2019-11-12 [1] Github (r-lib/rlang@5a0b80a) #> rmarkdown 1.16 2019-10-01 [1] CRAN (R 3.6.1) #> rprojroot 1.3-2 2018-01-03 [1] CRAN (R 3.6.0) #> sessioninfo 1.1.1 2018-11-05 [1] CRAN (R 3.6.0) #> stringi 1.4.3 2019-03-12 [1] CRAN (R 3.6.0) #> stringr 1.4.0 2019-02-10 [1] CRAN (R 3.6.0) #> testthat 2.3.0 2019-11-05 [1] CRAN (R 3.6.1) #> usethis 1.5.1 2019-07-04 [1] CRAN (R 3.6.1) #> withr 2.1.2 2018-03-15 [1] CRAN (R 3.6.0) #> xfun 0.11 2019-11-12 [1] CRAN (R 3.6.1) #> yaml 2.2.0 2018-07-25 [1] CRAN (R 3.6.0) #> #> [1] /home/elio/R/x86_64-pc-linux-gnu-library/3.6 #> [2] /usr/local/lib/R/site-library #> [3] /usr/lib/R/site-library #> [4] /usr/lib/R/library ```The relevant yaml section has this:
Which is not really correct, as the
type
is actuallyfactor
. I'm not at all familiar with the csvy spec so "factor" might not be a posible type. In any case, this applies not only to factors. It seems that write/read_csvy drops class attributes.For example,