tilburgsciencehub / website

Learn to work more efficiently on empirical research projects.
https://tilburgsciencehub.com
38 stars 47 forks source link

Suggest change to: topics/Analyze/machine-learning/supervised/XGBoost.md #1012

Open Markje99 opened 9 months ago

Markje99 commented 9 months ago

Could you please explain more about the tree you placed at the end? How to read it, what does it mean, what kind of information to gather from this?

Thank you!

hannesdatta commented 9 months ago

Hi @NielsRahder, can you handle this request and directly incorporate a solution in the content on the site? Thanks.

NielsRahder commented 9 months ago

Hi @Markje99,

The tree in the image is a combination of the entire ensemble of trees in the model intended to improve the interpretability of the model which is often seen as a "black box". The numbers between the brackets are the splits where the tree changes its value (so the first tree has its split for the feature hotwaterheating at 3.76e+14).

To better understand the model's inner workings, the xgb.plot.tree function can be useful. It allows plotting individual trees rather than the entire ensemble (I'll make sure to add this soon).

The documentation for the function xgb.plot.multi.trees is in the link.