Replace Ethnicity NaNs with Non-Hispanic

janhurst / unisa-tbi

Decision Support Tool for suspected Traumatic Brain Injuries

https://unisa-tbi.azurewebsites.net

1 stars 1 forks source link

Replace Ethnicity NaNs with Non-Hispanic #19

Closed janhurst closed 4 years ago

janhurst commented 4 years ago

There are a very large number of records missing for Ethnicity. We have two choices:

drop the column
set the NaNs in the column to Non-Hispanic

janhurst commented 4 years ago

I recommend we set the NaNs to Non-Hispanic.

This is on the basis of more Non-Hispanic being significantly more frequent in the dataset (22k vs 5.2k)

You can see some exploration of this at https://github.com/janhurst/capstone/blob/jan/jan/00-ethnicity-race-gender.ipynb

doughnuted commented 4 years ago

I would keep ethnicity if possible. We know that ML models often have ethnic/racial bias, linked with the a prior probability of certain groups being included/excluded from these studies. An important part of the cultural, ethical, social and legal implications of a state-wide machine-learning model would be any systematic bias against a group that already suffers health inequities. It's an indirect discussion, but we would want to describe how this model does or doesn't address inequities for health services (related to identifying ciTBI) in Indigenous Australians.

janhurst commented 4 years ago

I would keep ethnicity if possible

In the PECARN data Ethnicity only encodes Hispanic and Non-Hispanic. There is also a Race variable, which encodes White, Black, Asian, Native American, Alaskan etc.

I assume we will start to populate new categories into both Ethnicity and Race in the future e.g. Aboriginal, Torres Straight Islander and so on. (We might need some help in describing or defining these sensitively and appropriately).

I think we can handle this a little better with a "Not stated" or "Unknown" category for both Ethnicity and Race