Closed janhurst closed 4 years ago
We can consider 'AgeTwoPlus' as its having only two categories and we know this project is only for children. In case if we think age is a critical factor and the user enters that in front end screen. We can assume AgeInYears in that case.
AgeInYears have a good equal distribution of records than AgeTwoPlus.
Do we face any ethical issues if we include Age to our analysis?
Do we face any ethical issues if we include Age to our analysis?
No I don't think so.
Do you lose fidelity in your model? The easiest thing to do is going to be to accept age from a user experience perspective. Is it possible to create two separate models with these as comparative features and choose the best performing?
Do you lose fidelity in your model?
We don't know yet. The models are still a bit rough with some of the other data cleaning activities so it is a bit tricky to be sure.
Is it possible to create two separate models with these as comparative features and choose the best performing?
Absolutely, and I'm inclined to let a tree classifier use one of the numeric age variables to see where it splits.
I just wanted to sanity check that the age being split at 2 years wasn't particularly significant from a clinical standpoint?
No, it's a good sanity check. I think there's a good argument to do it- we treat children in a bunch of different groups, with <2/>2 being a good split.
All of the data is essentially categorical with the exception of Age.
We could choose to drop Age in favour of the original AgeTwoPlus categorical variable, other otherwise turn Age into a multilevel categorical variable.