Open yelselmiao opened 2 years ago
Based on HJ's comments:
[x] 2. page 4: column 2 header should be "#missing(%)" what does complete-case mean for gender or age? Does this mean you keep cases with both gender and age non-missing?
[x] For missing on "first-browser" can you make missing a separate category?
[ ] 3. page 7: multiple lines for a user in the original sessions database converted to actiontriplet1, actiontriplet2, ... in the cleaned sessions database. If userid is a field, does this mean that there are at most 20 actiontriplets?
[ ] 4. I think you will have to merge the categories with small counts.
[ ] 5. pages 11, 12, 13, 14: more than half 0s for some countries? Sqrt transform for y-axis would be better?
[x] 6. page 16: for these features, categories with small counts should be merged with other categories.
[ ] 7. Unbalanced classes will cause difficulties.
After reducing the number of classes for the response variable, maybe naive Bayes can be used. Please look at default-naiveBayes.pdf at the course web site for density estimation for continuous variables after transforming away extreme skewness.
3-4 Introduction
[ ] Project Overview (background, Objective, Importance)
[ ] Propaganda, advertising target population, renter selection, fee
[ ] Dataset Overview
[ ] nrow(test/train), n(varibale), reponse, predictors, continuous, categorical(level) 6-8 EDA
[x] Missing Value
[ ] Inbalance Class (Predictor & Response)
[ ] Transforming
[x] Outlier Process
[x] Skewness
[ ] Correlation 1-2 methodology (Briefly Explain)
[ ] Set Baseline Model
[ ] Variable Selection
[ ] Ultimate Model; Model for Comparision
[ ] Cross Validation (K-fold) 1-2 Conclusion