ropensci / skimr

A frictionless, pipeable approach to dealing with summary statistics
https://docs.ropensci.org/skimr
1.11k stars 79 forks source link

multiple types of labelling attributes may affect results. (skim) #606

Closed cloversleaves closed 3 years ago

cloversleaves commented 4 years ago

hello: Thank you for the package. I had mentioned an error being received in other package

https://github.com/rubenarslan/codebook/issues/52

and now think this may be related to skimr.

Would there be any suggestions. I think multiple types of labelling attributes may affect results.

Error: Problem with summarise() input skimmed. x Problem with summarise() input ~!@#$%^&()-+haven_labelled.sd. x Can't convert to . i Input ~!@#$%^&()-+haven_labelled.sd is (structure(function (..., .x = ..1, .y = ..2, . = ..1) .... i Input skimmed is purrr::map2(...). i The error occured in group 3: skim_type = "haven_labelled".

michaelquinn32 commented 4 years ago

Hi!

The fundamental issue is that skimr dispatches summary functions based on data type. haven_labelled isn't supported out of the box. You could either try using the skimrExtra package, which has some summary functions for that type: https://github.com/elinw/skimrextra

Or you can follow the guide here on how to add another data type: https://docs.ropensci.org/skimr/articles/Supporting_additional_objects.html

Best wishes, Michael

cloversleaves commented 4 years ago

Hi Michael, Thank you for getting back to me so quick! One thing I noticed is that this is code I have run for some time with no problem, so I was wondering what in an update could have changed.

I'll look at the skimrextra package

michaelquinn32 commented 4 years ago

My guess, from looking at your original error message, is that this is a consequence of stricter behavior in the vctrs package. We recently had to update this package to better work with those changes, but certain behavior that was allowed before now throws errors. From what I can tell:

Like I mentioned above, the safest solution within skimr is to have a different summary type for haven labelled. The approach could be as simple as forking the numeric skimmers.

get_skimmers.haven_labelled <- function(column) {
  modify_default_skimmers("numeric", new_skim_type = "haven_labelled")
}

@elinw, is this something we should directly support? We would add haven to suggests and one more method?

cloversleaves commented 4 years ago

Thank you Michael, and for the function. That explanation you gave about the vctrs package makes sense to me! danny

rubenarslan commented 4 years ago

I actually added skimmers for haven_labelled in codebook and it works for me and my tests. @cloversleaves can you make a reproducible example, ideally by sharing a minimal part of your dataset that still produces the error?

cloversleaves commented 4 years ago

Hi - thank you Ruben! It's a bit odd but I actually use Hmisc and haven for labelling due to the code that the database generates, but below is an example.


library(dplyr)
library(haven)
library(Hmisc)
library(codebook)

cars <- mtcars %>% 
  select(vs:carb)

cars$vs <- labelled(cars$vs ,c('unchecked'=0,'checked'=1))
cars$am <- labelled(cars$am ,c('unchecked'=0,'checked'=1))
label(cars$vs) = "Somebinary"
label(cars$am) = "Somebinary2"

codebook_table(cars) %>%
  mutate(value_labels = gsub("\\.", " = ", value_labels)) %>% #change val_label format to  "0 = unchecked, 1 = checked"
  mutate(value_labels = gsub(",",", ", value_labels))
elinw commented 4 years ago

The interesting thing to me is that when skimming directly (as opposed to via codebook) the fall back to character works but generates this warning (twice):

1: Couldn't find skimmers for class: labelled, haven_labelled, vctrs_vctr, double, numeric; No >user-defined sfl provided. Falling back to character.

But as you can see class(cars2$vs) for example would be

[1] "labelled" "haven_labelled" "vctrs_vctr" "double"

So I guess for skimr the question is whether when a variable has multiple classes we should try to search for a match to the sfls that currently exist.

rubenarslan commented 4 years ago

I think using Hmisc and haven in combination like that was not intended functionality. IIRC @hadley intentionally renamed the labelled class in haven to haven_labelled so as not to conflict with Hmiscs labelled function. @cloversleaves could just use labelled::var_label instead of Hmisc::label.

cloversleaves commented 4 years ago

Thank you! Once labeling is done with haven::labelled instead of Hmisc::label it works. I think I use Hmisc::label because REDCap exports data import code using Hmisc::label

library(dplyr)
library(haven)
library(Hmisc)
library(codebook)

cars <- mtcars %>% 
  select(vs:carb)

cars$vs <- labelled(cars$vs ,c('unchecked'=0,'checked'=1))
cars$am <- labelled(cars$am ,c('unchecked'=0,'checked'=1))

cars$am <- labelled(cars$am, label = "Somewow")
# 
# label(cars$vs) = "Somebinary"
# label(cars$am) = "Somebinary2"

codebook_table(cars) %>%
  mutate(value_labels = gsub("\\.", " = ", value_labels)) %>% #change val_label format to  "0 = unchecked, 1 = checked"
  mutate(value_labels = gsub(",",", ", value_labels))

 A tibble: 4 x 13
  name  label   data_type      value_labels                n_missing complete_rate min   median max    mean    sd n_value_labels hist    
  <chr> <chr>   <chr>          <chr>                           <int>         <dbl> <chr> <chr>  <chr> <dbl> <dbl>          <int> <chr>   
1 vs    NA      haven_labelled "0. unchecked,\n1. checked"         0             1 0     0      1     0.438 0.504              2 ▇▁▁▁▁▁▁▆
2 am    Somewow haven_labelled  NA                                 0             1 0     0      1     0.406 0.499              0 ▇▁▁▁▁▁▁▆
elinw commented 4 years ago

That's all good, but I'm really thinking about the bigger question @michaelquinn32 of why we aren't just treating it as numeric. (I do think it's useful to put it in it's own subtable but maybe we need to make our fall back smarter about checking %in% the classes and use the first match or something.)

cloversleaves commented 4 years ago

ps. I ended up just doing a quick regex capture group find and rearrange for the reams of code I had in Hmisc and used the base R attr function instead. All works now,

label(cars$am) = "Somebinary2"

to

attr(cars$am,"label") <- "Somebinary2"

elinw commented 3 years ago

I think I should close this and that the question I asked is really more of a separate issue from labelled variables.