Explore Data Information

jinchen1036 / Product-Price-Prediction

MIT License

0 stars 0 forks source link

Explore Data Information #5

Open jinchen1036 opened 3 years ago

jinchen1036 commented 3 years ago

Describe the ticket:

[x] Know how many samples existing in the dataset
[x] Find out number of missing values for each column

Describe the acceptance criteria of the ticket:

[x] Please add the information in the wiki page

Additional context:

Describe any relations and dependencies:

zhiyingzhu1995 commented 3 years ago

There are 1,482,535 rows and 8 columns in the original dataset. After sampling, we will only have 1,000 rows and 8 columns in the sample dataset

zhiyingzhu1995 commented 3 years ago

Here are the 8 columns of attributes 0 train_id 1000 non-null int64
1 name 1000 non-null object 2 item_condition_id 1000 non-null int64
3 category_name 996 non-null object 4 brand_name 561 non-null object 5 price 1000 non-null float64 6 shipping 1000 non-null int64
7 item_description 1000 non-null object dtypes: float64(1), int64(3), object(4)

zhiyingzhu1995 commented 3 years ago

Original dataset: there are 632,682 missing values in the brand_name column and 6,327 missing values in the category_name column.

Sample dataset: there are 439 missing values in the brand_name column and 4 missing values in the category_name column.