tidyverse / haven

Read SPSS, Stata and SAS files from R
https://haven.tidyverse.org
Other
423 stars 115 forks source link

haven_labelled: breaks a common operation with `drop` #698

Closed DanielEWeeks closed 1 year ago

DanielEWeeks commented 1 year ago

haven_labelled vectors should ignore commonly used drop argument instead of erroring out.

For details, see https://support.bioconductor.org/p/9148139/ where in part the discussion states:

the right thing to do is to fix [on haven_labelled objects in the haven package itself. Any attempt to fix this elsewhere is not satisfactory and won't prevent those objects from causing problems. We must be able to use these objects anywhere double vectors are used (haven_labelled inherits from double). Right now we can't because they break on a common operation (x[i, drop=TRUE] or x[i, drop=FALSE]) that works just fine on double objects. It's impossible to know exactly how many places in the vast Bioconductor + CRAN ecosystem use x[i, drop=TRUE] or x[i, drop=FALSE] on double objects, but there are probably many many of them. This means that haven_labelled objects will cause problems in many many places, unless they can also handle x[i, drop=TRUE] and x[i, drop=FALSE].

Brief description of the problem

A haven_labelled double vector x generates a cryptic error Error in proxy[, ..., drop = FALSE] : incorrect number of dimensions when using x[i, drop=TRUE] or x[i, drop=FALSE], while such operations work just fine on double objects. This also occurs with haven_labelled character vectors as well as with haven_labelled integer vectors.

Looks like haven_labelled vectors are sent to the [.vctrs_vctr operator with the drop = TRUE, which doesn't like the drop = TRUE argument:

3.  vec_index(x, i, ...)
2.`[.vctrs_vctr`(dn, 1:3, drop = TRUE)
1. dn[1:3, drop = TRUE]

Minimal reproducible example

library(haven)
dn <- labelled(c(rep(1,7),NA,rep(2,7)),label="Biological sex", labels=c(`N/A` = -2, Missing=-1,Male=1,Female=2))
attr(dn,"format.spss") <- "F3.0"
dn[1:3]
zap_labels(dn)[1:3,drop=TRUE]
dn[1:3, drop=TRUE]
packageVersion("haven")
R.version$version.string
sessionInfo()

Output from the Minimal reproducible example

> library(haven)
> dn <- labelled(c(rep(1,7),NA,rep(2,7)),label="Biological sex", labels=c(`N/A` = -2, Missing=-1,Male=1,Female=2))
> attr(dn,"format.spss") <- "F3.0"
> dn[1:3]
<labelled<double>[3]>: Biological sex
[1] 1 1 1

Labels:
> zap_labels(dn)[1:3,drop=TRUE]
[1] 1 1 1
> dn[1:3, drop=TRUE]
Error in proxy[, ..., drop = FALSE] : incorrect number of dimensions
> packageVersion("haven")
[1] ‘2.5.1’
> R.version$version.string
[1] "R version 4.2.2 (2022-10-31)"
> sessionInfo()
R version 4.2.2 (2022-10-31)
Platform: aarch64-apple-darwin20 (64-bit)
Running under: macOS Big Sur 11.6.7

Matrix products: default
LAPACK: /Library/Frameworks/R.framework/Versions/4.2-arm64/Versions/4.2-arm64/Resources/lib/libRlapack.dylib

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods  
[7] base     

other attached packages:
[1] haven_2.5.1

loaded via a namespace (and not attached):
 [1] fansi_1.0.3         utf8_1.2.2          lifecycle_1.0.3    
 [4] magrittr_2.0.3      pillar_1.8.1        rlang_1.0.6        
 [7] cli_3.4.1           rstudioapi_0.14     vctrs_0.5.1        
[10] ellipsis_0.3.2      forcats_0.5.2       tools_4.2.2        
[13] glue_1.6.2          hms_1.1.2           xfun_0.35          
[16] compiler_4.2.2      pkgconfig_2.0.3     BiocManager_1.30.19
[19] knitr_1.41          tibble_3.1.8   
gorcha commented 1 year ago

Hi @DanielEWeeks, thanks for the detailed bug report!

The haven_labelled and haven_labelled_spss classes are implemented using the vctrs package, which provides the subsetting functions and is the source of this error message. Any change will need to be made in vctrs, and this issue will also affect any other packages that use vctrs to implement their classes.

I've created an issue over at r-lib/vctrs#1751 - I'm closing this issue since there's nothing additional we can do on the haven side.