kjhealy / gssr

General Social Survey (GSS) data files packaged for R
http://kjhealy.github.io/gssr/
Other
41 stars 6 forks source link

Missing NA labels in panel data #11

Open JoeNoonan opened 1 month ago

JoeNoonan commented 1 month ago

Hi! Thanks so much for your work on this, it has saved me an insane amount of time.

I have a research question where I am looking at Don't Know responses as the dependent variable. In the panel datasets these, DK are just labeled as NAs along with true missing data and Other.

Is there a way to get the DK values?

library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
library(gssr)
#> Warning: package 'gssr' was built under R version 4.3.3
#> Package loaded. To attach the GSS data, type data(gss_all) at the console.
#> For the codebook, type data(gss_dict).
#> For the panel data and documentation, type e.g. data(gss_panel08_long) and data(gss_panel_doc).
#> For help on a specific GSS variable, type ?varname at the console.
library(labelled)

data("gss_panel06_long")
val_labels(gss_panel06_long$homosex)
#>              IAP     ALWAYS WRONG ALMST ALWAYS WRG  SOMETIMES WRONG 
#>                0                1                2                3 
#> NOT WRONG AT ALL            OTHER               DK               NA 
#>                4                5                8                9
count(gss_panel06_long,homosex)
#> # A tibble: 5 × 2
#>   homosex                   n
#>   <dbl+lbl>             <int>
#> 1  1 [ALWAYS WRONG]      1691
#> 2  2 [ALMST ALWAYS WRG]   157
#> 3  3 [SOMETIMES WRONG]    226
#> 4  4 [NOT WRONG AT ALL]  1048
#> 5 NA                     2878

Created on 2024-11-01 with reprex v2.1.1

kjhealy commented 2 weeks ago

Hi Joe, I'm in the process of updating the panel data due to another issue; the missingness labels have been causing trouble there too. I'll try to see if there's a good solution. (Labeled values—not just the missing ones—make converting the data to long format significantly more error-prone and awkward.) It may be that the answer will be to convert all the NA codes to factor values alongside the regular responses.