Open traversc opened 8 years ago
A few clarifications
Thanks for your insight. It is really appreciated as I learn about new algorithms. I didn't mean to imply that #144 used boosted trees, but was trying to continue the discussion on algorithms that have state of the art performance, but are not based on deep learning methods.
According to the second article I linked to, XGBoost has "More than half of the winning solutions in machine learning challenges" (in Kaggle). I am not entirely sure why this is, but I suspect some factors may be time/computational constraints and also the type of datasets used in those competitions.
In the same line of thought of algorithms which claim... or have beaten deep learning methods (Issue 144), Gradient Boosted Trees is one of them.
http://xgboost.readthedocs.io/en/latest/model.html
This method has won some recent Machine Learning competitions (http://www.kdnuggets.com/2016/03/xgboost-implementing-winningest-kaggle-algorithm-spark-flink.html)
And has also been applied to EHR data, recently (http://www.aclweb.org/anthology/W/W16/W16-29.pdf#page=13)
It is a method similar to Random Forest, based on an ensemble of trees. However, unlike RF which randomly bootstraps tree features and samples, XGBoost selects specific trees based on "gradient boosting". It purports to be faster and have equivalent or better performance with much fewer trees.