Open jinchen1036 opened 3 years ago
There are 1,482,535 rows and 8 columns in the original dataset. After sampling, we will only have 1,000 rows and 8 columns in the sample dataset
Here are the 8 columns of attributes
0 train_id 1000 non-null int64
1 name 1000 non-null object
2 item_condition_id 1000 non-null int64
3 category_name 996 non-null object
4 brand_name 561 non-null object
5 price 1000 non-null float64
6 shipping 1000 non-null int64
7 item_description 1000 non-null object
dtypes: float64(1), int64(3), object(4)
Original dataset: there are 632,682 missing values in the brand_name column and 6,327 missing values in the category_name column.
Sample dataset: there are 439 missing values in the brand_name column and 4 missing values in the category_name column.
Describe the ticket:
Describe the acceptance criteria of the ticket:
Additional context:
Describe any relations and dependencies: