Closed ellisp closed 3 years ago
This is a real response - so it is certainly something that we expect to observe. My concern is that some people provide responses in the format "I was assigned female at birth, I am male" (should be coded male) or "I am sexually male, but identify as female" (should be coded female).
This is similar to a question @Matherion asked on Twitter about why we don't use regex, so if people can see a "safe" way to code responses of this type I'd be very interested.
Hmm, it would inevitably be a bit ad hoc but one could have a "fuzzy" option that searches first for "identify as XXX", then for "I am YYY". If I have any ideas I will do a PR (and won't be offended if it's rejected).
Do we have a corpus of examples of this type, Emily? If we can at least get a sample of the problem space it should make it easier to advance this.
The included sample data set has some examples of this type - although I know we do get more when purposively sampling more gender diverse participants. I wonder if @debruine would have any additional responses in this format that would be useful for this problem. I know that Lisa has recently collected some data in this format.
Here are some of the responses from the sample data set that are more than 1 word long. Some of these are already correctly recoded (e.g. Non binary to non-binary) or are already in the correct format (e.g. trans man). The responses here that I would worry about are "agender (woman)" which should be coded as "agender" and "female to non-binary" which should be coded as non-binary. These exact example would probably be fine because it is likely we'd include agender and non-binary in any fuzzy matching prioritisation, however I am concerned that a response in this format would be incorrectly coded if a participant responded with a gender identity such as "hijra" which is not currently in the dictionary (e.g. "male to hijra" or "hijra (man)").
Gender |
---|
agender (woman) |
Non binary |
Trans man |
Transgender man |
Female to non-binary |
Gender is a social construct - I'm sexually female |
transgender female |
Non Binary |
Apache Helicopter... Just kidding. There are only two. I am a Male. |
Female (cisgender) |
cis female |
Male(Sex, Gender is a silly construct) |
female (Cisgender) |
Transsexual male (FTM) |
See further discussion in #37 - I'm closing this to reduce duplication.
The example in the readme currently has "I am male" becoming an NA for narrow_gender and "I am male" for broad_gender. If this is a pattern people actually use, wouldn't it be better if "^I am " (and any similar patterns)were deleted before matching to the dictionary so this resolved to "male", "I am nb" resolves to "sex and gender diverse", etc?