ropensci / gendercoder

Creating R package to code free text gender responses
https://docs.ropensci.org/gendercoder/
Other
46 stars 12 forks source link

should "I am XXX" resolve to "XXX" #34

Closed ellisp closed 3 years ago

ellisp commented 3 years ago

The example in the readme currently has "I am male" becoming an NA for narrow_gender and "I am male" for broad_gender. If this is a pattern people actually use, wouldn't it be better if "^I am " (and any similar patterns)were deleted before matching to the dictionary so this resolved to "male", "I am nb" resolves to "sex and gender diverse", etc?

ekothe commented 3 years ago

This is a real response - so it is certainly something that we expect to observe. My concern is that some people provide responses in the format "I was assigned female at birth, I am male" (should be coded male) or "I am sexually male, but identify as female" (should be coded female).

This is similar to a question @Matherion asked on Twitter about why we don't use regex, so if people can see a "safe" way to code responses of this type I'd be very interested.

ellisp commented 3 years ago

Hmm, it would inevitably be a bit ad hoc but one could have a "fuzzy" option that searches first for "identify as XXX", then for "I am YYY". If I have any ideas I will do a PR (and won't be offended if it's rejected).

Lingtax commented 3 years ago

Do we have a corpus of examples of this type, Emily? If we can at least get a sample of the problem space it should make it easier to advance this.

ekothe commented 3 years ago

The included sample data set has some examples of this type - although I know we do get more when purposively sampling more gender diverse participants. I wonder if @debruine would have any additional responses in this format that would be useful for this problem. I know that Lisa has recently collected some data in this format.

Here are some of the responses from the sample data set that are more than 1 word long. Some of these are already correctly recoded (e.g. Non binary to non-binary) or are already in the correct format (e.g. trans man). The responses here that I would worry about are "agender (woman)" which should be coded as "agender" and "female to non-binary" which should be coded as non-binary. These exact example would probably be fine because it is likely we'd include agender and non-binary in any fuzzy matching prioritisation, however I am concerned that a response in this format would be incorrectly coded if a participant responded with a gender identity such as "hijra" which is not currently in the dictionary (e.g. "male to hijra" or "hijra (man)").

Gender
agender (woman)
Non binary
Trans man
Transgender man
Female to non-binary
Gender is a social construct - I'm sexually female
transgender female
Non Binary
Apache Helicopter... Just kidding. There are only two. I am a Male.
Female (cisgender)
cis female
Male(Sex, Gender is a silly construct)
female (Cisgender)
Transsexual male (FTM)
ekothe commented 3 years ago

See further discussion in #37 - I'm closing this to reduce duplication.