Closed GoogleCodeExporter closed 8 years ago
the forest is an ensemble of multiple trees and each tree is constructed by
sampling with replacement (bagging) the training examples that means that about
63.6% of training data is used for constructing each tree (but different trees
have a different example training set due to bagging)
RF uses this property that each tree has about 100-63.6% of data not used for
training used as validation set and that is used to predict the out of bag
examples or OOB (search that in the tutorial file)
usually 5x2CV or 10CV is the standard when giving an algorithms performance and
the OOB idea is limited to classifiers that uses an ensemble + bagging. so
people usually use (like in SVM) training into training + validation and then
choose the best model on validation and then use those parameters to train a
single model on training and predict on test
whereas if you are using RF, you dont have to create a validation set, use all
the training data to create models and find the best model with lowest oob
error and then use that model to predict on test. usually i use all the
training and set a fixed ntree=1000 and search over multiple mtry values
mtry=D/10:D/10:D (where D = number of features) and choose the model that had
the lowest ooberr and use that model to predict on test.
comparing it to svm, i would ideally create 10 different folds, randomly pick
one fold for validation, 8 for training and one for test, then parameterically
search over varoius kernels etc by creating models on training and predicting
on validation. once i find the best model parameter i create a single model
using training + validation and then predict on the test fold. and do it lots
of time
Original comment by abhirana
on 11 May 2012 at 4:51
i meant that for 10CV and SVM i would do the following
i would ideally create 10 different folds, randomly pick one fold for
validation, 8 for training and one for test, then parameterically search over
varoius kernels etc by creating models on training and predicting on
validation. once i find the best model parameter i create a single model using
training + validation and then predict on the test fold. and do it lots of time
Original comment by abhirana
on 11 May 2012 at 5:00
Thank you for your explanations. I just want to make sure that I understand
your meaning:
I still needs to have a separate test set to test the best model on that but
since the validation part is doing internally in RF, I dont need to have
training + validation set and can use all the training for training, correct?
If so, I still am confused about what Breiman website says regarding no need of
a separate test set.
I want to compare the result of RF with fKNN classifier on my data set. For
fKNN I leave on subject(101*101 pixels) out to validate the accuracy and use 69
subject (69*101*101 pixels) as training set. In order to do a fair comparison,
is that correct if I do the same method to create the best model using training
set and test the model on the one that is left out and do the same things for
every other subjects?
Sorry but I still cannot understand how can I evaluate the best model without
using any separate test set as its said in Breiman website.
Appreciate your help and time.
Original comment by m.saleh....@gmail.com
on 11 May 2012 at 7:22
I still needs to have a separate test set to test the best model on that but
since the validation part is doing internally in RF, I dont need to have
training + validation set and can use all the training for training, correct?
If so, I still am confused about what Breiman website says regarding no need of
a separate test set.
- yup this is correct. breiman showed that ooberr gives an upper bound on the
validation set. the reason why a test set is not required is because the
results on validation, on ooberr are similar to that on the test set and
usually RF behaves nicely to like 1000 trees and the default mtry parameter and
maybe that is why they say of not needing a separate test set. but for
publishable results and to be equivalent to reporting on other classifiers it
is important to do a training + (validation) + test split
I want to compare the result of RF with fKNN classifier on my data set. For
fKNN I leave on subject(101*101 pixels) out to validate the accuracy and use 69
subject (69*101*101 pixels) as training set. In order to do a fair comparison,
is that correct if I do the same method to create the best model using training
set and test the model on the one that is left out and do the same things for
every other subjects?
- yeh or instead of leave one out go with 5x2CV or 10fold CV, that might be
faster, and also make sure that the splits using to train/test knn are the same
split used for training/testing RF, so that you can do some paired testing on
the results.
Sorry but I still cannot understand how can I evaluate the best model without
using any separate test set as its said in Breiman website.
- well, lets say you do not fix any kind of parameter of RF except set it to
1000 trees and the default mtry value and then create a bunch of trees, for
each tree part of the dataset is not used for training (due to bagging). now
use the individual trees to predict on all examples that were not used for
training those trees and then take the ensemble votes on those examples (out of
bag for trees) and report those results. now consider what you do with typical
classifiers, you will ideally create a training/test split, now divide that
training set into training+validation to pick the best parameter and then
create a single model with the training set and the best parameter and then use
that model to predict on test. you will do this tons of time and report the
final test error. this is no different than an individual tree in the forest
which trains on a unique dataset and predicts on a held out set and does it for
a ton of different trees. the only difference is because its an ensemble it
takes the final votes over held out examples at the very end. and some research
has shown that an held out validation and ooberr tend to be similar and if you
take all the data in your dataset then the ooberr=tsterr
Original comment by abhirana
on 11 May 2012 at 8:01
Thank you so much for the clarification. I guess now I have better
understanding of what Breiman said. If I use the whole dataset for training and
compute the obberr that would be similar to test error but in order to publish
the result of classification as the classifier accuracy and compare with other
classifiers I'd better do a CV.
Thanks again!
Original comment by m.saleh....@gmail.com
on 11 May 2012 at 9:27
Original comment by abhirana
on 19 Dec 2012 at 9:07
Original issue reported on code.google.com by
m.saleh....@gmail.com
on 11 May 2012 at 2:20