jmbejara / comp-econ-sp19

Main Course Repository for Computational Methods in Economics (Econ 21410, Spring 2019)
48 stars 26 forks source link

HW3: accounting for missing data #24

Closed jonhelium closed 5 years ago

jonhelium commented 5 years ago

I was just wondering how we should approach using a dataset to run a regression if there are missing values within it: should we simply use .dropna() to drop the NaN's in the dataset, or should we instead replace NaN with 0 while also using a dummy variable to indicate that the data value is missing (or is there another method we should follow)?

jmbejara commented 5 years ago

Hi @jonhelium . It really depends on context. For example, on rare occasions, a missing data value really does mean that you should replace it with a zero. In most cases, people drop observations (rows) that contain any missing values in the columns of interest. Just use .dropna() after you chosen a subset of the columns that you are interested in. By doing this, you can minimize the number of rows that you end up dropping.