vibudh2209 / CPSC_340_kaggle_project

1 stars 0 forks source link

Ideas #1

Open vibudh2209 opened 6 years ago

vibudh2209 commented 6 years ago

1) Make a single model predicting 0 or (1 and log revenue) 2) Make class variable for categorical values and a lot of data cleaning incldng date (Weekday/Weekend), time (Morning, Afternoon, Evening, night) 3) Instead of date can include festival (0/1) based on country/region (Assuming dataset is not big enough to deduce such info)

fransilvion commented 6 years ago

STEPS

  1. EDA is done (@fransilvion should submit it)
  2. Some columns (for examples metro) have different ways of saying NA (merge them).
  3. Remove features that are constant (keep features with one value and NA)
  4. All categorical features should be one hot encoded. (if there is NA - treat it is as separate feature)
  5. Use all features for now
  6. For classification: logistic regression with L1 loss, random forest, GBM, KNN, DNN, SVM with downsampling? Each method try with some kind of sampling.
  7. Try ensembling and stacking.
  8. For regression start with linear regression with L1;
  9. Try everything with RF and LGBM.