ropensci / ozunconf18

repository for the rOpenSci ozunconference 2018
31 stars 7 forks source link

Create functions for coding free-text gender responses #22

Open ekothe opened 6 years ago

ekothe commented 6 years ago

Some researchers (myself included) are starting to use free-text response option to obtain gender information from participants. This means we can get richer gender data and avoids 'othering' participants but creates some annoying workload issues for coding those responses for analysis.

Specifically, while I don't want to collapse all gender responses into two categories - typographical errors, inconsistency in capitalisation etc. creates more levels of the gender factor than is actually available in the data. Hand recoding all gender responses within every dataset presents a barrier to entry for some researchers. It would be useful to create some example script/functions that recodes common responses to reduce the workload associated with this task.

For reference here is how I achieved this in a recent dataset

dat <- dat %>% 
  mutate(
    gender = case_when(
      gender == "F"  ~ "Female",
      gender == "FAMELA"  ~ "Female",
      gender == "FEAMLE"  ~ "Female",
      gender == "EMALE" ~ "Female",
      gender == "FEM"  ~ "Female",
      gender == "FEMAIL"  ~ "Female",
      gender == "FEMAL"  ~ "Female",
      gender == "FEMALE"  ~ "Female",
      gender == "WOMAN"  ~ "Female",
      gender == "M" ~ "Male",
      gender == "MALE" ~ "Male",
      gender == "AGENDER (WOMAN)" ~ "Agender",
      gender == "NON BINARY" ~ "Non-binary",
      gender == "NONBINARY" ~ "Non-binary",
      gender == "NON-BINARY" ~ "Non-binary",
      gender == "MASCULINO" ~ "Male",
      gender == "40" ~ "NA",
      gender == "GENDER" ~ "NA")

  )
danwwilson commented 6 years ago

Maybe creating a PR to the recoder or decoder packages could address this and other (e.g. age banding) use cases that researchers often encounter.