ld-archer / E_FEM

This is the repository for the English version of the Future Elderly Model, originally developed at the Leonard D. Schaeffer Center for Health Policy and Microsimulation.
MIT License
3 stars 1 forks source link

Impute missing educ information in GlobalPreInitialization module #12

Closed ld-archer closed 3 years ago

ld-archer commented 3 years ago

Use a regression model to impute missing educ information in GlobalPreInitialization module.

Include information on spouse and parents education (if available) as predictors.

ld-archer commented 3 years ago

After imputing educ, need to run the summary_stats script again to get the updated stats for each input population.

ld-archer commented 3 years ago

Update

Model has been estimated, and is being loaded by GlobalPreInitializationModule correctly (I think). However, missing education values are STILL not being imputed correctly. Need to talk to Bryan about what I'm missing, too complicated to explain in an email.

ld-archer commented 3 years ago

This is caused by a few variables. Firstly, educl is missing over 40,000 data points out of 86,701, so over half the samples could be removed from this one variable. The majority of missing data here (~22,000) is due to the respondent being unmarried at the time of interview. To solve that, we can create an interaction variable for married * educl. The interaction var will have value = 0 if not married, and value = educl if married.

Still not working!!!

ld-archer commented 3 years ago

SOLVED

Problem: All the missing values in educ that we wanted to impute were set to special missing characters (i.e. .m, .d). These special missing characters are only recognised in Stata, so were showing up as missing in Stata but were not missing in .cpp files. Therefore, when trying to impute educ in the GlobalPreInit module, none of the values were recognised as missing.

Solution: Replace all special missing characters for educ with simple missing (.) at the end of reshape_long.do