training oob error larger than testing error

kogalur / randomForestSRC

DOCUMENTATION:

https://www.randomforestsrc.org/

GNU General Public License v3.0

115 stars 18 forks source link

training oob error larger than testing error #70

Closed bnuzyc91 closed 3 years ago

bnuzyc91 commented 4 years ago

I notice that in my application, I trained the data on the training dataset (DS) and get a training error rate. When I apply the trained model to a testing DS. The testing error rate sometimes can be smaller than training OOB error rate.

I got an impression from my other modeling experience that the model should perform better on the training DS than testing DS.

So could you help understand why the testing error rate can be smaller than training OOB error rate?

ishwaran commented 3 years ago

Testing error will almost always be smaller than out-of-sample error. For example in least squares, training error is goodness of fit. Out-of-sample error is performance on test data set. Obviously the first will be smaller, because you are testing on the same data you are training. This is machine learning 101.