Gradient Boosted Trees (XGBoost)

greenelab / deep-review

A collaboratively written review paper on deep learning, genomics, and precision medicine

https://greenelab.github.io/deep-review/

Other

1.25k stars 271 forks source link

Gradient Boosted Trees (XGBoost) #146

Open traversc opened 7 years ago

traversc commented 7 years ago

In the same line of thought of algorithms which claim... or have beaten deep learning methods (Issue 144), Gradient Boosted Trees is one of them.

http://xgboost.readthedocs.io/en/latest/model.html

XGBoost is short for “Extreme Gradient Boosting”, where the term “Gradient Boosting” is proposed in the paper Greedy Function Approximation: A Gradient Boosting Machine, by Friedman. XGBoost is based on this original model.

This method has won some recent Machine Learning competitions (http://www.kdnuggets.com/2016/03/xgboost-implementing-winningest-kaggle-algorithm-spark-flink.html)

And has also been applied to EHR data, recently (http://www.aclweb.org/anthology/W/W16/W16-29.pdf#page=13)

It is a method similar to Random Forest, based on an ensemble of trees. However, unlike RF which randomly bootstraps tree features and samples, XGBoost selects specific trees based on "gradient boosting". It purports to be faster and have equivalent or better performance with much fewer trees.

akundaje commented 7 years ago

A few clarifications

144 results do not show that Boosted Trees beat "Deep learning". They beat DeepBind. DeepBind is most certainly not a strong representative of deep learning for various other reasons beyond the fact that it is not a deep model and is most certainly not an optimal architecture for the problem #144 is trying to solve.
Gradient Boosting is not necessarily faster than Random Forests. Random forests are by construction embarrassingly parallel. Each tree can be learned independently allowing for massive parallelization. Most common implementations of RFs (e.g. the one in R) are terribly slow and don't use such tricks. The XGBoost implementation is excellent making it fast but in Boosting, weak learners (Trees) at each iteration explicitly depend on those from previous iterations. So scope for parallelization is less compared to RFs.
That being said, as you noted Boosting and RFs are both powerful ensemble learning technique that work very well with rich pre-defined feature spaces.

traversc commented 7 years ago

Thanks for your insight. It is really appreciated as I learn about new algorithms. I didn't mean to imply that #144 used boosted trees, but was trying to continue the discussion on algorithms that have state of the art performance, but are not based on deep learning methods.

According to the second article I linked to, XGBoost has "More than half of the winning solutions in machine learning challenges" (in Kaggle). I am not entirely sure why this is, but I suspect some factors may be time/computational constraints and also the type of datasets used in those competitions.