pedromlsreis / paranormal_distributions

Group N project for Data Mining course 2019-2020
1 stars 0 forks source link

Dealing with missing values #1

Closed pedromlsreis closed 4 years ago

pedromlsreis commented 4 years ago

The df has many columns containing missing values/NaNs.

[In 53]:
df.isnull().sum()
[Out 53]:
First_Policy          30
Birthday              18
Education             17
Salary                36
Area                   1
Children              21
CMV                    0
Claims                 0
Motor                 34
Household              0
Health                43
Life                 104
Work_Compensation     86
dtype: int64

TODO: Figure out how to handle missing data.

Might be good to treat it individually, column by column.

kalrashid15 commented 4 years ago

Agreed!

pedromlsreis commented 4 years ago

Do you agree we should handle missing values the same way we handle outliers?

pedromlsreis commented 4 years ago

This should be reseen later. It's filling each NaN value with its own column average. Might not make sense in some columns. @kalrashid15

pedromlsreis commented 4 years ago

Solved with both cfd3f504f87090027ed4c76689a0d7ba08698e41 and 01f15d8ce26ff6d8a472858e32959a71edc222ec.