rebeccajohnson88 / qss20_s21_proj

Repo for DOL Summer Data Challenge on equity in H-2A oversight
Creative Commons Zero v1.0 Universal
2 stars 2 forks source link

Running list of TRLA intake data questions #19

Closed rebeccajohnson88 closed 3 years ago

rebeccajohnson88 commented 3 years ago

Script here: https://github.com/rebeccajohnson88/qss20_s21_proj/blob/main/code/11_cleanTRLA_intake.Rmd

Category: adverse party/opponent:

I did the hierarchical coding discussed on slack in this chunk of code, so "missing all" means that a given row had none of the following fields: adverse party organization, lead case ap organization, adverse party name, lead case ap name

About 47% are missing all four fields:

image

trla_orig = trla_orig %>%
      mutate(derived_opponent_consolidated = case_when(!is.na(adverse_party_organization) ~ adverse_party_organization,
                                               is.na(adverse_party_organization) & !is.na(lead_case_ap_organization) ~ lead_case_ap_organization,
                                               is.na(adverse_party_organization) & is.na(lead_case_ap_organization) &
                                              !is.na(adverse_party_name) ~ adverse_party_name,
                                              is.na(adverse_party_organization) & is.na(lead_case_ap_organization) &
                                              is.na(adverse_party_name) & !is.na(lead_case_ap_name) ~ lead_case_ap_name,
                                              TRUE ~ NA_character_),
             derived_opponent_source = case_when(!is.na(adverse_party_organization) ~ "AP org",
                                               is.na(adverse_party_organization) & !is.na(lead_case_ap_organization) ~ "Lead AP org",
                                               is.na(adverse_party_organization) & is.na(lead_case_ap_organization) &
                                              !is.na(adverse_party_name) ~ "AP name",
                                              is.na(adverse_party_organization) & is.na(lead_case_ap_organization) &
                                              is.na(adverse_party_name) & !is.na(lead_case_ap_name) ~ "Lead AP name",
                                              TRUE ~ "Missing all"),
derived_is_notemp = case_when(grepl("Social Security Administration|Department of Labor|Workforce", derived_opponent_consolidated) ~ TRUE,
                                      TRUE ~  FALSE)

Intake date: was the pull all cases or pre-2014? there are a non-negligible number in the 1990s and possibly one data entry error that should be recoded to maybe 1991?

image

rebeccajohnson88 commented 3 years ago

https://github.com/rebeccajohnson88/qss20_s21_proj/blob/main/code/11_cleanTRLA_intake.Rmd

Ideal output is a dataset with the following columns (ok for cases to be repeated across rows if there are multiple adverse parties):