Closed ld-archer closed 3 years ago
After imputing educ, need to run the summary_stats script again to get the updated stats for each input population.
Update
Model has been estimated, and is being loaded by GlobalPreInitializationModule
correctly (I think). However, missing education values are STILL not being imputed correctly. Need to talk to Bryan about what I'm missing, too complicated to explain in an email.
This is caused by a few variables. Firstly, educl is missing over 40,000 data points out of 86,701, so over half the samples could be removed from this one variable. The majority of missing data here (~22,000) is due to the respondent being unmarried at the time of interview. To solve that, we can create an interaction variable for married * educl. The interaction var will have value = 0 if not married, and value = educl if married.
Still not working!!!
SOLVED
Problem: All the missing values in educ that we wanted to impute were set to special missing characters (i.e. .m, .d). These special missing characters are only recognised in Stata, so were showing up as missing in Stata but were not missing in .cpp files. Therefore, when trying to impute educ in the GlobalPreInit module, none of the values were recognised as missing.
Solution: Replace all special missing characters for educ with simple missing (.) at the end of reshape_long.do
Use a regression model to impute missing educ information in GlobalPreInitialization module.
Include information on spouse and parents education (if available) as predictors.