wangzilongri / SoybeanProject4741

SoybeanProject4741
2 stars 4 forks source link

Final Report Review - Anne Ng #14

Open nga0315 opened 7 years ago

nga0315 commented 7 years ago

Well done! This report is very professionally written and the topic is interesting. The exploratory data analysis is clear and beautiful I like how you take up some challenges by exploring beyond the standard procedures. For example, you are bold about resampling in order to rebalance the GRAD features. Moreover, you used diverse kinds of methods such as Naive classifier, k-clustering and adaboost. You gave very detailed equations and procedures for each method.

Things that I would like you to improve is probably the discussion of the results. I am curious about why Adaboost works better than other methods. I would like to know how your assumptions and results differ or coincides. It would be great if you talk more in detail about the plan of your approaches in order for readers to follow your thoughts. This would be a great academic paper. However, if your readers are company management, more explanations would help.

claraong commented 7 years ago

Hi Anne

Thanks for your feedback!

One of the possible reasons why Adaboost does better is because Adaboost is an ensemble method which combines a few weak classifiers to give a stronger overall classifier. With each iteration, Adaboost gives more weight to the weak learners that have a higher misclassification error. Also, Adaboost gives more weight to the training set data points that are incorrectly predicted.

Here's a link: http://machinelearningmastery.com/boosting-and-adaboost-for-machine-learning/

Another reason which we have not explored might be because we discarded the data for GRAD = -1, whereas we kept them for Naive Bayes and K means. If we ran Naive Bayes and K means similarly (ie discarding the data), we could get a better comparison of performance among the 3 models :)