Open GoogleCodeExporter opened 8 years ago
for 1) yup, change the nodesize to change the depth of each tree. a larger
nodesize will create smaller trees.
for 2) consider that a binary tree has 2^m possible nodes when the depth of the
tree is m. most data are not complicated enough that you will mostly never end
up with 2^m (an recursive xor dataset will get you a 2^m node tree, btw) so
many of the nodes are terminal and maynot even get to the 2^m depth. But the
way the tree is stored is by assuming that one is storing the probably 2^m
nodes (but fear not its incremented in a way that 2^m nodes are not really
stored but a much smaller fraction). The best way to get the tree structure is
to start at the root node, and get the child node and from that child node get
to the child of child node and so on. This information though is saved as a
contiguous vector and the zero means that the node is never created.
if you trying to understand how the tree structure is stored, maybe this will
help you http://code.google.com/p/randomforest-matlab/issues/detail?id=18&can=1
hope this helps
Original comment by abhirana
on 24 Oct 2012 at 2:15
Hi, I am new to random forest and I do not understand that a larger nodesize
will create smaller trees. Wouldn't a larger nodesize will result in larger
depth (~log2(nodesize)) in trees, and therefore create bigger trees?
Thanks in advance.
Original comment by hyo...@cs.unc.edu
on 1 Aug 2014 at 9:31
hello @hyojin
i think you are confusing number of nodes in a tree with nodesize.
nodesize = when a node has nodesize or less examples then the splitting stops.
let's say you are at the root node and let's say the k-th feature divides the
data at value v; examples that have value of k feature < v will fall in the
left node and examples with value k>=v will fall in the right node. this
splitting is recursively done till the number of nodes falling into a node are
less than or equal to nodesize. then the tree won't be further grown from that
node.
so a larger nodesize will create short trees and smaller nodesize will create
tall trees. note that this depth also depends on the type of tree;
classification trees will stop growing in a single level for linearly divisible
data.
Original comment by abhirana
on 3 Aug 2014 at 12:18
Oh I see. Thanks. So nrnodes is the number of nodes, right? If I have trees
with nrnodes = 8001, then my tree would have maximum depth of log2(8001)?
Original comment by hyo...@cs.unc.edu
on 4 Aug 2014 at 2:45
yup thats somewhat correct. note that unlike a purely binary tree, random
forest trees may be unbalanced and some branches may be longer than others,
some branches are terminated near the root node. so the right answer would be a
depth of O(log2(nrnodes))
Original comment by abhirana
on 9 Aug 2014 at 4:19
Original issue reported on code.google.com by
umer.r...@gmail.com
on 23 Oct 2012 at 11:19