tidyverse / haven

Read SPSS, Stata and SAS files from R
https://haven.tidyverse.org
Other
424 stars 117 forks source link

`read_xpt()` fails on misconstructed dates #747

Open DanChaltiel opened 10 months ago

DanChaltiel commented 10 months ago

Hi,

In some 3rd party-generated XPT files, Dates may be misconstructed, with the number of days being encoded as a string.

This doesn't cause any trouble when using SAS or other software (e.g. https://stattransfer.com/), but it causes instant failure when using {haven}.

Presently, the whole reading process throws an error if one single column is corrupt, while IMHO it should only throw a warning on the column and deliver it raw. This way one could try to fix the issue manually.

Here is a reprex:

x = structure(c("20424", "20487"), label = "Date", class = "Date")
a = data.frame(id=1:2, date=x) #would not work with tibble()
a
#>   id       date
#> 1  1 2025-12-02
#> 2  2 2026-02-03

haven::write_xpt(a, "test.xpt")
haven::read_xpt("test.xpt")
#> Error in `date_validate()`:
#> ! Corrupt `Date` with unknown type character.
#> ℹ In file 'type-date-time.c' at line 344.
#> ℹ This is an internal error that was detected in the vctrs package.
#>   Please report it at <https://github.com/r-lib/vctrs/issues> with a reprex (<https://tidyverse.org/help/>) and the full backtrace.
#> Backtrace:
#>      ▆
#>   1. ├─haven::read_xpt("test.xpt")
#>   2. │ └─haven:::df_parse_xpt_file(spec, cols_skip, n_max, skip, name_repair = .name_repair)
#>   3. ├─tibble (local) `<fn>`(`<named list>`, .rows = 2L, .name_repair = "unique")
#>   4. ├─tibble:::as_tibble.list(`<named list>`, .rows = 2L, .name_repair = "unique")
#>   5. │ └─tibble:::lst_to_tibble(x, .rows, .name_repair, col_lengths(x))
#>   6. │   └─tibble:::check_valid_cols(x, call = call)
#>   7. │     ├─base::which(!map_lgl(x, is_valid_col))
#>   8. │     └─tibble:::map_lgl(x, is_valid_col)
#>   9. │       └─tibble:::map_mold(.x, .f, logical(1), ...)
#>  10. │         └─base::vapply(.x, .f, .mold, ..., USE.NAMES = FALSE)
#>  11. │           └─tibble (local) FUN(X[[i]], ...)
#>  12. │             └─vctrs::vec_is(x)
#>  13. │               └─vctrs::obj_is_vector(x)
#>  14. │                 └─vctrs (local) `<fn>`()
#>  15. │                   └─vctrs::vec_proxy(x = x)
#>  16. │                     └─vctrs:::date_validate(x)
#>  17. └─rlang:::stop_internal_c_lib(...)
#>  18.   └─rlang::abort(message, call = call, .internal = TRUE, .frame = frame)

Created on 2023-12-18 with reprex v2.0.2

This issue is related to https://github.com/tidyverse/haven/issues/536, but with a reprex this time :-)

botsp commented 10 months ago

Hi, Dan It seems that there is an issue with the "test.xpt" file that was created using the {write_xpt} function. image

As "$Date." is not a recognizable format for character variable in SAS/XPT, it should be "Date.". Maybe this affect the creation?

image

image

DanChaltiel commented 10 months ago

Hi Kevin, Yes, the "test.xpt" file has the same problem as the output of a 3rd party software which generates flawed XPT files in some settings. Here, my object x is misconstructed as it holds a character vector of class Date while Dates should always be numeric.

The present issue is about error management in read_xpt() so that one can overcome such flawed XPT files. Most R users cannot correct XPT files so if haven do not let us read them we are unfortunately helpless. I'm not sure write_xpt() should be corrected for that matter, as this flawed R object x should never occur naturally.

botsp commented 10 months ago

Agree, it seems the current conversion tool doesn't works very well.