khan1792 / MACS30200proj

0 stars 0 forks source link

Review - Jie Heng #7

Open jheng18 opened 6 years ago

jheng18 commented 6 years ago

It is an indeed a big data research! The use of 3D plot and maps help audience understand the patterns of driving easily. Poster is well organized, neat and easy to follow. Conclusions are stated clearly and are supported by model results. As for the model, I have one question. Is it really necessary to include more than 25 independent variables into your model? Using F-test or Wald test might help you exclude some variables that are not statistically significant. Why you choose not to remove some variable? Maybe you should explain that a little in the poster to justify your model. Some suggestions:

  1. Your color scheme is a little distracting. For example, in the left bottom, the really bright red, blue and green are too harsh. You could try some soft color.
  2. The space between lines in the objective box is too wide. Could you make some adjustment?
  3. I find it a little hard to read your explanation for the plots and tables. Maybe you could enlarge the font size. Those explanations are important for readers to understand the logics and the meaning of those plots. Overall, this was a really good topic and nice poster!
khan1792 commented 6 years ago

@jheng18 Thank you very much for the suggestion. I think I do need to add some explanations about the models. Actually for the logistic model, because it is a predictive model rather than an explanatory model, significance is not the most important thing. Log-likelihood, AIC and BIC of the whole model are much more important indicators. And in the following machine learning models, Lasso and Elastic Net can automatically select or give weights to variables. So even we have hundreds variables it is not a problem. Gradient boosting trees and random forest also give different weights to variables in every tree. As long as the number of variables is not very large, it ok to use them.