leeper / csvy

Import and Export CSV Data With a YAML Metadata Header
57 stars 3 forks source link

Dropped factors? #25

Open eliocamp opened 4 years ago

eliocamp commented 4 years ago

Please specify whether your issue is about:

This is shown in the readme, but is it intended behaviour for factor variables to be converted to text in a roundtrip?

library(csvy)
csvy::write_csvy(iris, "iris.csvy")
all.equal(iris, csvy::read_csvy("iris.csvy"))
#> [1] "Attributes: < Names: 1 string mismatch >"                            
#> [2] "Attributes: < Length mismatch: comparison on first 2 components >"   
#> [3] "Attributes: < Component 2: Modes: numeric, character >"              
#> [4] "Attributes: < Component 2: Lengths: 150, 1 >"                        
#> [5] "Attributes: < Component 2: target is numeric, current is character >"
#> [6] "Component \"Species\": 'current' is not a factor"

str(iris)
#> 'data.frame':    150 obs. of  5 variables:
#>  $ Sepal.Length: num  5.1 4.9 4.7 4.6 5 5.4 4.6 5 4.4 4.9 ...
#>  $ Sepal.Width : num  3.5 3 3.2 3.1 3.6 3.9 3.4 3.4 2.9 3.1 ...
#>  $ Petal.Length: num  1.4 1.4 1.3 1.5 1.4 1.7 1.4 1.5 1.4 1.5 ...
#>  $ Petal.Width : num  0.2 0.2 0.2 0.2 0.2 0.4 0.3 0.2 0.2 0.1 ...
#>  $ Species     : Factor w/ 3 levels "setosa","versicolor",..: 1 1 1 1 1 1 1 1 1 1 ...
str(csvy::read_csvy("iris.csvy"))
#> 'data.frame':    150 obs. of  5 variables:
#>  $ Sepal.Length: num  5.1 4.9 4.7 4.6 5 5.4 4.6 5 4.4 4.9 ...
#>  $ Sepal.Width : num  3.5 3 3.2 3.1 3.6 3.9 3.4 3.4 2.9 3.1 ...
#>  $ Petal.Length: num  1.4 1.4 1.3 1.5 1.4 1.7 1.4 1.5 1.4 1.5 ...
#>  $ Petal.Width : num  0.2 0.2 0.2 0.2 0.2 0.4 0.3 0.2 0.2 0.1 ...
#>  $ Species     : chr  "setosa" "setosa" "setosa" "setosa" ...
#>   ..- attr(*, "levels")= chr  "setosa" "versicolor" "virginica"
#>  - attr(*, "profile")= chr "tabular-data-package"
#>  - attr(*, "name")= chr "iris"
Session info ``` r devtools::session_info() #> ─ Session info ────────────────────────────────────────────────────────── #> setting value #> version R version 3.6.2 (2019-12-12) #> os elementary OS 5.1 Hera #> system x86_64, linux-gnu #> ui X11 #> language en_US #> collate en_US.UTF-8 #> ctype en_US.UTF-8 #> tz America/Argentina/Buenos_Aires #> date 2019-12-16 #> #> ─ Packages ────────────────────────────────────────────────────────────── #> package * version date lib source #> assertthat 0.2.1 2019-03-21 [1] CRAN (R 3.6.0) #> backports 1.1.5 2019-10-02 [1] CRAN (R 3.6.1) #> callr 3.3.2 2019-09-22 [1] CRAN (R 3.6.1) #> cli 1.1.0 2019-03-19 [1] CRAN (R 3.6.0) #> crayon 1.3.4 2017-09-16 [1] CRAN (R 3.6.0) #> csvy * 0.3.0 2019-12-16 [1] Github (leeper/csvy@af0aa8d) #> data.table 1.12.6 2019-10-18 [1] CRAN (R 3.6.1) #> desc 1.2.0 2018-05-01 [1] CRAN (R 3.6.0) #> devtools 2.2.0.9000 2019-09-17 [1] Github (r-lib/devtools@2765fbe) #> digest 0.6.23 2019-11-23 [1] CRAN (R 3.6.1) #> ellipsis 0.3.0 2019-09-20 [1] CRAN (R 3.6.1) #> evaluate 0.14 2019-05-28 [1] CRAN (R 3.6.0) #> fs 1.3.1 2019-05-06 [1] CRAN (R 3.6.1) #> glue 1.3.1.9000 2019-09-17 [1] Github (tidyverse/glue@71eeddf) #> highr 0.8 2019-03-20 [1] CRAN (R 3.6.0) #> htmltools 0.4.0 2019-10-04 [1] CRAN (R 3.6.1) #> jsonlite 1.6 2018-12-07 [1] CRAN (R 3.6.0) #> knitr 1.25 2019-09-18 [1] CRAN (R 3.6.1) #> magrittr 1.5 2014-11-22 [1] CRAN (R 3.6.0) #> memoise 1.1.0 2017-04-21 [1] CRAN (R 3.6.0) #> pkgbuild 1.0.6 2019-10-09 [1] CRAN (R 3.6.1) #> pkgload 1.0.2 2018-10-29 [1] CRAN (R 3.6.0) #> prettyunits 1.0.2 2015-07-13 [1] CRAN (R 3.6.0) #> processx 3.4.1 2019-07-18 [1] CRAN (R 3.6.1) #> ps 1.3.0 2018-12-21 [1] CRAN (R 3.6.0) #> R6 2.4.1 2019-11-12 [1] CRAN (R 3.6.1) #> Rcpp 1.0.3 2019-11-08 [1] CRAN (R 3.6.1) #> remotes 2.1.0 2019-06-24 [1] CRAN (R 3.6.1) #> rlang 0.4.1.9000 2019-11-12 [1] Github (r-lib/rlang@5a0b80a) #> rmarkdown 1.16 2019-10-01 [1] CRAN (R 3.6.1) #> rprojroot 1.3-2 2018-01-03 [1] CRAN (R 3.6.0) #> sessioninfo 1.1.1 2018-11-05 [1] CRAN (R 3.6.0) #> stringi 1.4.3 2019-03-12 [1] CRAN (R 3.6.0) #> stringr 1.4.0 2019-02-10 [1] CRAN (R 3.6.0) #> testthat 2.3.0 2019-11-05 [1] CRAN (R 3.6.1) #> usethis 1.5.1 2019-07-04 [1] CRAN (R 3.6.1) #> withr 2.1.2 2018-03-15 [1] CRAN (R 3.6.0) #> xfun 0.11 2019-11-12 [1] CRAN (R 3.6.1) #> yaml 2.2.0 2018-07-25 [1] CRAN (R 3.6.0) #> #> [1] /home/elio/R/x86_64-pc-linux-gnu-library/3.6 #> [2] /usr/local/lib/R/site-library #> [3] /usr/lib/R/site-library #> [4] /usr/lib/R/library ```

The relevant yaml section has this:

#- name: Species
#  type: string
#  levels:
#  - setosa
#  - versicolor
#  - virginica

Which is not really correct, as the type is actually factor. I'm not at all familiar with the csvy spec so "factor" might not be a posible type. In any case, this applies not only to factors. It seems that write/read_csvy drops class attributes.

For example,

library(csvy)

iris$col <- 1
class(iris$col) <- c("custom_cass", "numeric")
csvy::write_csvy(iris, "iris.csvy")
str(iris)
#> 'data.frame':    150 obs. of  6 variables:
#>  $ Sepal.Length: num  5.1 4.9 4.7 4.6 5 5.4 4.6 5 4.4 4.9 ...
#>  $ Sepal.Width : num  3.5 3 3.2 3.1 3.6 3.9 3.4 3.4 2.9 3.1 ...
#>  $ Petal.Length: num  1.4 1.4 1.3 1.5 1.4 1.7 1.4 1.5 1.4 1.5 ...
#>  $ Petal.Width : num  0.2 0.2 0.2 0.2 0.2 0.4 0.3 0.2 0.2 0.1 ...
#>  $ Species     : Factor w/ 3 levels "setosa","versicolor",..: 1 1 1 1 1 1 1 1 1 1 ...
#>  $ col         : 'custom_cass' num  1 1 1 1 1 1 1 1 1 1 ...
str(csvy::read_csvy("iris.csvy"))
#> 'data.frame':    150 obs. of  6 variables:
#>  $ Sepal.Length: num  5.1 4.9 4.7 4.6 5 5.4 4.6 5 4.4 4.9 ...
#>  $ Sepal.Width : num  3.5 3 3.2 3.1 3.6 3.9 3.4 3.4 2.9 3.1 ...
#>  $ Petal.Length: num  1.4 1.4 1.3 1.5 1.4 1.7 1.4 1.5 1.4 1.5 ...
#>  $ Petal.Width : num  0.2 0.2 0.2 0.2 0.2 0.4 0.3 0.2 0.2 0.1 ...
#>  $ Species     : chr  "setosa" "setosa" "setosa" "setosa" ...
#>   ..- attr(*, "levels")= chr  "setosa" "versicolor" "virginica"
#>  $ col         : chr  "1" "1" "1" "1" ...
#>  - attr(*, "profile")= chr "tabular-data-package"
#>  - attr(*, "name")= chr "iris"