neuniversity / ALY6140

1 stars 3 forks source link

Checking the null and missing value and delete #50

Open cloverpyy opened 5 years ago

cloverpyy commented 5 years ago

We always using the .isnull() to checking the null value and I know we do not have method to delete the null value at specific cell otherwise we judge the number of null value at specific rows or columns and delete the whole rows or column. However, I want to ask how to check and delete the missing value in dataset and what is the meaning of missing value, is it same as null value or have some differences?

ThatkidfromA commented 5 years ago

Hi, From my understanding, to check the missing value you can use 'df.isnull().sum()'. Then, you fill the missing value with 0 and use this code 'df1[df1.columnname != 0]'. !=0 mean that any value that is 0 will be delete. For my understanding, the missing value or null value means the data has no record about the value.

kn1510 commented 5 years ago

Hi,

In statistics, missing data, or missing values, occur when no data value is stored for the variable in an observation. Missing data are a common occurrence and can have a significant effect on the conclusions that can be drawn from the data. Please check below link which provides efficient examples on how to treat missing and null data https://chrisalbon.com/python/data_wrangling/pandas_missing_data/

To check missing value you can do below code which gives a count of missing values in each column print(df.isnull().sum())

To drop all missing values you can use below code: df_no_missing = df.dropna()

To treat missing values in a particular column use below code: df['Column'].fillna(df['Column'].mode()[0], inplace=True) Here I replacced all null/missing with mode of column, you can do same for median, mean or zero.

Hope this helps!

Best, Kalyani