strengejacke / sjlabelled

Working with Labelled Data in R
https://strengejacke.github.io/sjlabelled
86 stars 11 forks source link

Label shift after converting incomplete labeled data to factor and afterwards to label #62

Open fretwurst opened 5 months ago

fretwurst commented 5 months ago

Thank you for your work! We work a lot with your packages.

For some reasons I found a label shift after some converting label, I had to do for graphics. The smallest possible value was not in the data (originally a student survey with no one under 18). After converting first to factor and afterwards zu label the label shiftet one step. Here is a reproducible example:

data <- tidyr::tibble(Answer = c(2, 2, 3, 3, 3)) |> # one is not in the data
  sjlabelled::set_labels(Answer, labels = c('one' = 1, 'two' = 2, 'three' = 3)) 

data |> sjmisc::frq(Answer) # everything as expected

data <- data |> 
    sjlabelled::as_factor(Answer) 

data |> sjmisc::frq(Answer) # everything as expected

data |> 
   sjlabelled::as_label(Answer) |> 
   sjmisc::frq(Answer) # shifted label

Is it a bug or a feature? :-)

nathanj3 commented 1 month ago

Yes, I've gotten the same issue when working with multiple datasets. It looks like it's because when it's converted to factor by as_label(), the levels of the factor are only the levels present in the variable.

In your example, using factor(data$Answer) leads to the same issue, but factor(data$Answer, levels = attr(data$Answer, "labels")) has correct levels.

(And from there, factor(data$Answer, levels = attr(data$Answer, "labels"), labels = attr(attr(data$Answer, "labels"), "names")) returns the correct labels, though I'm sure a more elegant solution exists.)