margiejam / randomforest-matlab

Automatically exported from code.google.com/p/randomforest-matlab
0 stars 0 forks source link

Training speed of Regression Forest #45

Open GoogleCodeExporter opened 8 years ago

GoogleCodeExporter commented 8 years ago
First thank you very much for this wonderful software!

I notice that for same number of samples and features, if only difference is 
the labeling type so one problem is classification and the other problem is 
regression, the time taken for construction of regression forest will be 
considerably longer than classification forest (using default parameters for 
msplit and keep ntrees the same. We also estimate variable importance along the 
way.) Is there any reasons behind this?

Thanks a lot!

Original issue reported on code.google.com by KangD...@gmail.com on 27 Sep 2012 at 8:15

GoogleCodeExporter commented 8 years ago
Hi Kang

yeh there is a difference between the regression/classification code. when 
creating tree you need to split data but before splitting you need to sort data 
falling into a node. the classification code uses a pre-sorted array and that 
makes the classification code scale as O(number of example) whereas regression 
code uses on the fly code and that makes regression code scale as O(nlog(n)) - 
best sort code scaling.

i am guessing you have lots of examples and thats one reason regression might 
be slower. 

the other reason might be that regression trees may be split totally (i.e leaf 
nodes have the minimum number of examples) whereas your classification trees 
might be much simpler (a low VC dimension)

calculate the mean number of nodes in the model created, that might give you 
some more idea
mean(modelRf.ndbigtree) (classification)
mean(modelRf.ndtree)(regression)

Original comment by abhirana on 27 Sep 2012 at 10:41