Copied from the discussion post:
-investigate how person weight should be used or should be excluded:
Keeping PERWT due to it helps the analysis to be representative of the broader population. It helps correct for biases and provides more accurate insights. May be used in conjunction with HHWT to help normalize the data for the broader population
-exclude health insurance coverage vs not for the more detailed insurance variable:
Decided to keep both general and detailed insurance variable. It might be interesting to see if the different types of health coverage may affect the model.
-how should we represent missing:
Based off the data exploration and some of my own exploration, I couldn't find any missing data. I think we are good here unless anyone finds anything.
-used detailed employment status but recode:
Added new column for detailed employment status as string. Also, performed additional data exploration.
-exclude (technical variables beside year)
Removed the following variables:('SAMPLE', 'SERIAL', 'CBSERIAL', 'HHWT', 'PERNUM', 'CBPERNUM', 'CLUSTER', 'CPI99', 'STRATA', 'EMPSTAT', 'CLASSWKR')
Copied from the discussion post: -investigate how person weight should be used or should be excluded: Keeping PERWT due to it helps the analysis to be representative of the broader population. It helps correct for biases and provides more accurate insights. May be used in conjunction with HHWT to help normalize the data for the broader population
-exclude health insurance coverage vs not for the more detailed insurance variable: Decided to keep both general and detailed insurance variable. It might be interesting to see if the different types of health coverage may affect the model.
-how should we represent missing: Based off the data exploration and some of my own exploration, I couldn't find any missing data. I think we are good here unless anyone finds anything.
-used detailed employment status but recode: Added new column for detailed employment status as string. Also, performed additional data exploration.
-exclude (technical variables beside year) Removed the following variables:('SAMPLE', 'SERIAL', 'CBSERIAL', 'HHWT', 'PERNUM', 'CBPERNUM', 'CLUSTER', 'CPI99', 'STRATA', 'EMPSTAT', 'CLASSWKR')