twitte01 / 232R_GroupProject

UCSD Spring 2024 232R Big Data Analytics Using Spark Group Project
0 stars 2 forks source link

Variables Removal & Modification - Household #60

Closed twitte01 closed 6 months ago

PraveenManimaran commented 6 months ago

Remove variables with lots of null values Variables to be removed with large null count and N/A : 'MET2023', 'COUPLETYPE', 'CINETHH', 'TAXINCL', 'INSINCL' Variables not relevant: 'SAMPLE', 'SERIAL', 'CBSERIAL', 'HHWT', 'CLUSTER', 'CPI99', 'STRATA',

Things to consider: 'ARENTGRS', 'AMOBLHOME' have lots of 0's which indicate that they are considered N/A so should consider removing in future.