vubiostat / r-yaml

R package for converting objects to and from YAML
http://biostat.app.vumc.org/wiki/Main/YamlR
Other
162 stars 39 forks source link

Unexpected interaction between `column.major` and Date `handlers` in `as.yaml()` #141

Open cynthiahqy opened 6 months ago

cynthiahqy commented 6 months ago

In as.yaml(), setting column.major=FALSE appears to prevent the application of Date handlers. For a while, I thought I was specifying my handler functions incorrectly, but I don't believe that is the case. I used these references for the handlers:

I've included a minimal example showing the difference in date handling when column.major=TRUE (default) vs. when column.major=FALSE. This interaction doesn't appear when using the verbatim_logical handler, making the Date behaviour entirely unexpected.

library(yaml)
library(purrr)

## Working date handling
date_handler <- list(Date = function(x) as.character(x))
my_list <- list(date = as.Date(c("2012-10-10", "2014-03-28")))
yaml::as.yaml(my_list, handlers = date_handler)
#> [1] "date:\n- '2012-10-10'\n- '2014-03-28'\n"
## instead of a list, use tibble and df (with column.major = TRUE by default)
my_tbl <- tibble::tibble(date = as.Date(c("2012-10-10", "2014-03-28")))
yaml::as.yaml(my_tbl, handlers = date_handler) |> cat()
#> date:
#> - '2012-10-10'
#> - '2014-03-28'
## but the date_handler doesn't apply with column.major = FALSE?
yaml::as.yaml(my_tbl, handlers = date_handler, column.major = FALSE) |> cat()
#> - date: 15623.0
#> - date: 16157.0

## Same issue with data.frames
my_df <- as.data.frame(my_tbl)
yaml::as.yaml(my_df, handlers = date_handler)
#> [1] "date:\n- '2012-10-10'\n- '2014-03-28'\n"
yaml::as.yaml(my_df, handlers = date_handler, column.major = FALSE)
#> [1] "- date: 15623.0\n- date: 16157.0\n"

## What if convert the tibble to list first?
# purrr::transpose also does the weird number conversion
purrr::transpose(my_tbl)
#> [[1]]
#> [[1]]$date
#> [1] 15623
#> 
#> 
#> [[2]]
#> [[2]]$date
#> [1] 16157
# but not if you do as.list() first
as.list(my_tbl) |>
    purrr::list_transpose(simplify = FALSE)
#> [[1]]
#> [[1]]$date
#> [1] "2012-10-10"
#> 
#> 
#> [[2]]
#> [[2]]$date
#> [1] "2014-03-28"

# what about logical handlers?
my_tbl$bool <- c(TRUE, FALSE)
verbatim_logical <- function(x) {
    result <- tolower(as.logical(x))
    class(result) <- "verbatim"
    return(result)
}
yml_handlers <- c(date_handler, list(logical = verbatim_logical))
# Logical handlers work fine either way, but dates depend on column.major
my_tbl |> yaml::as.yaml(handlers = yml_handlers, column.major = FALSE)
#> [1] "- date: 15623.0\n  bool: true\n- date: 16157.0\n  bool: false\n"
my_tbl |> yaml::as.yaml(handlers = yml_handlers, column.major = TRUE)
#> [1] "date:\n- '2012-10-10'\n- '2014-03-28'\nbool:\n- true\n- false\n"

Created on 2024-03-18 with reprex v2.0.2

Session info ``` r sessioninfo::session_info() #> ─ Session info ─────────────────────────────────────────────────────────────── #> setting value #> version R version 4.3.1 (2023-06-16) #> os macOS Ventura 13.4.1 #> system aarch64, darwin20 #> ui X11 #> language (EN) #> collate en_US.UTF-8 #> ctype en_US.UTF-8 #> tz Australia/Melbourne #> date 2024-03-18 #> pandoc 3.1.7 @ /opt/homebrew/bin/ (via rmarkdown) #> #> ─ Packages ─────────────────────────────────────────────────────────────────── #> package * version date (UTC) lib source #> cli 3.6.1 2023-03-23 [1] CRAN (R 4.3.0) #> digest 0.6.33 2023-07-07 [1] CRAN (R 4.3.0) #> evaluate 0.21 2023-05-05 [1] CRAN (R 4.3.0) #> fansi 1.0.4 2023-01-22 [1] CRAN (R 4.3.0) #> fastmap 1.1.1 2023-02-24 [1] CRAN (R 4.3.0) #> fs 1.6.3 2023-07-20 [1] CRAN (R 4.3.0) #> glue 1.6.2 2022-02-24 [1] CRAN (R 4.3.0) #> htmltools 0.5.6 2023-08-10 [1] CRAN (R 4.3.0) #> knitr 1.43 2023-05-25 [1] CRAN (R 4.3.0) #> lifecycle 1.0.3 2022-10-07 [1] CRAN (R 4.3.0) #> magrittr 2.0.3 2022-03-30 [1] CRAN (R 4.3.0) #> pillar 1.9.0 2023-03-22 [1] CRAN (R 4.3.0) #> pkgconfig 2.0.3 2019-09-22 [1] CRAN (R 4.3.0) #> purrr * 1.0.2 2023-08-10 [1] CRAN (R 4.3.0) #> R.cache 0.16.0 2022-07-21 [1] CRAN (R 4.3.0) #> R.methodsS3 1.8.2 2022-06-13 [1] CRAN (R 4.3.0) #> R.oo 1.25.0 2022-06-12 [1] CRAN (R 4.3.0) #> R.utils 2.12.2 2022-11-11 [1] CRAN (R 4.3.0) #> reprex 2.0.2 2022-08-17 [1] CRAN (R 4.3.0) #> rlang 1.1.1 2023-04-28 [1] CRAN (R 4.3.0) #> rmarkdown 2.24 2023-08-14 [1] CRAN (R 4.3.0) #> sessioninfo 1.2.2 2021-12-06 [1] CRAN (R 4.3.0) #> styler 1.10.1 2023-06-05 [1] CRAN (R 4.3.0) #> tibble 3.2.1 2023-03-20 [1] CRAN (R 4.3.0) #> utf8 1.2.3 2023-01-31 [1] CRAN (R 4.3.0) #> vctrs 0.6.3 2023-06-14 [1] CRAN (R 4.3.0) #> withr 2.5.0 2022-03-03 [1] CRAN (R 4.3.0) #> xfun 0.39 2023-04-20 [1] CRAN (R 4.3.0) #> yaml * 2.3.7 2023-01-23 [1] CRAN (R 4.3.0) #> #> [1] /Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/library #> #> ────────────────────────────────────────────────────────────────────────────── ```
dupontct commented 5 months ago

I have replicated the issue.

When running in column.major = FALSE mode. To encode the row it picks that rows element from the each column to encode the row as a list. The problem being that when the element from the picked up the element loses it class information. Thus the element is no longer a Date but instead is a numeric (which is the base class of the Date type). That is why the Date handler doesn't do anything when column.major = FALSE. Boolean is fine because it is a intrinsic data type.

Now working on a fix.