Closed cloversleaves closed 3 years ago
Hi!
The fundamental issue is that skimr dispatches summary functions based on data type. haven_labelled
isn't supported out of the box. You could either try using the skimrExtra
package, which has some summary functions for that type:
https://github.com/elinw/skimrextra
Or you can follow the guide here on how to add another data type: https://docs.ropensci.org/skimr/articles/Supporting_additional_objects.html
Best wishes, Michael
Hi Michael, Thank you for getting back to me so quick! One thing I noticed is that this is code I have run for some time with no problem, so I was wondering what in an update could have changed.
I'll look at the skimrextra package
My guess, from looking at your original error message, is that this is a consequence of stricter behavior in the vctrs package. We recently had to update this package to better work with those changes, but certain behavior that was allowed before now throws errors. From what I can tell:
Like I mentioned above, the safest solution within skimr is to have a different summary type for haven labelled. The approach could be as simple as forking the numeric skimmers.
get_skimmers.haven_labelled <- function(column) {
modify_default_skimmers("numeric", new_skim_type = "haven_labelled")
}
@elinw, is this something we should directly support? We would add haven to suggests and one more method?
Thank you Michael, and for the function. That explanation you gave about the vctrs package makes sense to me! danny
I actually added skimmers for haven_labelled
in codebook and it works for me and my tests. @cloversleaves can you make a reproducible example, ideally by sharing a minimal part of your dataset that still produces the error?
Hi - thank you Ruben! It's a bit odd but I actually use Hmisc and haven for labelling due to the code that the database generates, but below is an example.
library(dplyr)
library(haven)
library(Hmisc)
library(codebook)
cars <- mtcars %>%
select(vs:carb)
cars$vs <- labelled(cars$vs ,c('unchecked'=0,'checked'=1))
cars$am <- labelled(cars$am ,c('unchecked'=0,'checked'=1))
label(cars$vs) = "Somebinary"
label(cars$am) = "Somebinary2"
codebook_table(cars) %>%
mutate(value_labels = gsub("\\.", " = ", value_labels)) %>% #change val_label format to "0 = unchecked, 1 = checked"
mutate(value_labels = gsub(",",", ", value_labels))
The interesting thing to me is that when skimming directly (as opposed to via codebook) the fall back to character works but generates this warning (twice):
1: Couldn't find skimmers for class: labelled, haven_labelled, vctrs_vctr, double, numeric; No >user-defined
sfl
provided. Falling back tocharacter
.
But as you can see class(cars2$vs)
for example would be
[1] "labelled" "haven_labelled" "vctrs_vctr" "double"
So I guess for skimr the question is whether when a variable has multiple classes we should try to search for a match to the sfls that currently exist.
I think using Hmisc
and haven
in combination like that was not intended functionality. IIRC @hadley intentionally renamed the labelled
class in haven
to haven_labelled
so as not to conflict with Hmisc
s labelled function. @cloversleaves could just use labelled::var_label
instead of Hmisc::label
.
Thank you! Once labeling is done with haven::labelled
instead of Hmisc::label
it works. I think I use Hmisc::label
because REDCap exports data import code using Hmisc::label
library(dplyr)
library(haven)
library(Hmisc)
library(codebook)
cars <- mtcars %>%
select(vs:carb)
cars$vs <- labelled(cars$vs ,c('unchecked'=0,'checked'=1))
cars$am <- labelled(cars$am ,c('unchecked'=0,'checked'=1))
cars$am <- labelled(cars$am, label = "Somewow")
#
# label(cars$vs) = "Somebinary"
# label(cars$am) = "Somebinary2"
codebook_table(cars) %>%
mutate(value_labels = gsub("\\.", " = ", value_labels)) %>% #change val_label format to "0 = unchecked, 1 = checked"
mutate(value_labels = gsub(",",", ", value_labels))
A tibble: 4 x 13
name label data_type value_labels n_missing complete_rate min median max mean sd n_value_labels hist
<chr> <chr> <chr> <chr> <int> <dbl> <chr> <chr> <chr> <dbl> <dbl> <int> <chr>
1 vs NA haven_labelled "0. unchecked,\n1. checked" 0 1 0 0 1 0.438 0.504 2 ▇▁▁▁▁▁▁▆
2 am Somewow haven_labelled NA 0 1 0 0 1 0.406 0.499 0 ▇▁▁▁▁▁▁▆
That's all good, but I'm really thinking about the bigger question @michaelquinn32 of why we aren't just treating it as numeric. (I do think it's useful to put it in it's own subtable but maybe we need to make our fall back smarter about checking %in% the classes and use the first match or something.)
ps. I ended up just doing a quick regex capture group find and rearrange for the reams of code I had in Hmisc and used the base R attr
function instead. All works now,
label(cars$am) = "Somebinary2"
to
attr(cars$am,"label") <- "Somebinary2"
I think I should close this and that the question I asked is really more of a separate issue from labelled variables.
hello: Thank you for the package. I had mentioned an error being received in other package
https://github.com/rubenarslan/codebook/issues/52
and now think this may be related to skimr.
Would there be any suggestions. I think multiple types of labelling attributes may affect results.
Error: Problem with summarise() input skimmed. x Problem with summarise() input ~!@#$%^&()-+haven_labelled.sd. x Can't convert to .
i Input ~!@#$%^& ()-+haven_labelled.sd is (structure(function (..., .x = ..1, .y = ..2, . = ..1) ....
i Input skimmed is purrr::map2(...).
i The error occured in group 3: skim_type = "haven_labelled".