ropensci / skimr

A frictionless, pipeable approach to dealing with summary statistics
https://docs.ropensci.org/skimr
1.12k stars 79 forks source link

Error when skimming `haven_labelled` variables #737

Open luizaandrade opened 1 year ago

luizaandrade commented 1 year ago

I'm getting the following error message when using skimr 2.1.5 on R 4.2.3:

Error in `dplyr::summarize()`:
ℹ In argument: `skimmed = purrr::map2(...)`.
Caused by error in `purrr::map2()`:
ℹ In index: 1.
ℹ With name: numeric.
Caused by error in `dplyr::summarize()`:
ℹ In argument: `dplyr::across(tidyselect::any_of(variable_names), mangled_skimmers$funs)`.
Caused by error in `across()`:
! Can't compute column `vlg2008_~!@#$%^&*()-+numeric.sd`.
Caused by error in `as.double()`:
! Can't convert `x` <haven_labelled> to <double>.
Backtrace:
  1. ... %>% ...
 40. skimr (local) `<rlng_lm_>`(vlg2008)
 41. stats::sd(., na.rm = TRUE)
 45. vctrs:::as.double.vctrs_vctr(x)

I have been able to use skim with the same data in the past without errors, but unfortunately I don't know exactly which version of skimr and R I was using when it last worked. My best guess is that it was skimr 2.1.5 and R 4.2.1.

elinw commented 1 year ago

Thanks, we have worked on supporting haven_labelled variables before but maybe there has been a change somewhere in the pipeline. Do you have sample data that we can use to test?

elinw commented 1 year ago

Also could you tell us you version(s) of tidyverse packages? It looks like dplyr and purrr are both possibly involved.

elinw commented 1 year ago

I tried this:

iris$Sepal.Length.L <- labelled(iris$Sepal.Length)
class(iris$Sepal.Length.L)

[1] "haven_labelled" "vctrs_vctr" "double"

skim(iris)

and it worked fine. I also tried adding labels but it still worked. So I'm wondering if you can share your tidyverse versions (vctrs, dplyr, purrr specifically). Also can you try to determine if it is a specific labelled variable that is causing the error? (use select() to cut out different columns). If it is not all labelled variables but just specific ones, please share some more information about those variables.

elinw commented 1 year ago

Any updates?

luizaandrade commented 1 year ago

Hi @elinw , sorry I missed this. And sorry I didn't share a repex when first opened the issue, or I would have realized that the problem was that although I'd loaded tidyverse, haven was not loaded. When haven is loaded, it works just fine. Thanks for your patience with this and for maintaining this great resource!

elinw commented 1 year ago

Okay it's funny that I just saw the same issue and your solution worked. But I do think that it is an issue, you shouldn't have to load haven to make it work.

elinw commented 1 year ago

@michaelquinn32 I think we need to move haven from suggested to required OR when we see haven_labelled numeric data we need to test for it and produce an error if it is not installed or skip it and warn. The issue is that numeric haven_labelled data needs the as.numeric.haven_labelled from haven.