harrysouthworth / gbm

Gradient boosted models
Other
106 stars 27 forks source link

Levels with zero weight are always assigned to the right child #45

Closed pat-oreilly closed 9 years ago

pat-oreilly commented 9 years ago
g <- gbm(y ~ x, 
         distribution="gaussian", 
         train.fraction=1,
         bag.fraction=0.1,
         data=data.frame(x=as.factor(1:100), y=rnorm(100)), 
         n.trees=1,
         n.minobsinnode=1)

g$c.splits[[1]]
 [1]  1  1 -1  1  1  1  1 -1  1 -1  1  1 -1  1  1  1  1  1  1  1  1  1
[23]  1 -1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1
[45]  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1
[67]  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1  1
[89]  1  1  1  1  1  1  1  1  1  1  1  1

In this example 5 levels are assigned to the left child and 95 are assigned to the right child. Of those at the right child, 90 will have had zero weight while training (since bag.fraction=0.1). It's an artificial example but a node having zero weight for a particular level is very possible when training a real model due to

I'm wondering if this could be an issue since the tree is (in expectation) over-predicting for zero-weight levels. Granted, later trees will attempt to correct for any over-prediction but

  1. the later trees may also have zero-weight levels at their nodes
  2. convergence may be improved if the need for correction is avoided.

Perhaps in these circumstances it is more reasonable to use the prediction at the parent node since there is no data to suggest whether the zero-weight levels should be assigned to either the left or right child?

harrysouthworth commented 9 years ago

This issue was moved to gbm-developers/gbm#4