Open GoogleCodeExporter opened 8 years ago
sorry, I use the package in matlab...
Original comment by zhangleu...@gmail.com
on 26 Sep 2012 at 2:36
hi zhang
treemap has the left and right node information for the trees in the forest.
the variable is used for navigating the tree
in this code i used treemap to plot the tree
code:
http://code.google.com/p/randomforest-matlab/issues/attachmentText?id=18&aid=180
001000&name=tutorial_plot_tree.m
relevant information on the treeplotting:
http://code.google.com/p/randomforest-matlab/issues/detail?id=18&can=1
treemap - stores the tree info. in regression code you have two variables ldau
and rdau that treemap consists of.
nodestatus = stores whether individual nodes are internal or leaf nodes
nodeclass = the class of the leaf nodes
bestvar = variable that splits the node
xbestsplit = value of the variable that splits the node (> goes to the right
side, else the left side)
the above variable are all NEEDED for prediction.
Original comment by abhirana
on 26 Sep 2012 at 5:59
Thank you for your reply.
When I further check the values of the treemap. I found that
model.treemap(:,tree_num*2) are always zeros. So what do these zeros stand
for?
Original comment by zhangleu...@gmail.com
on 26 Sep 2012 at 7:10
zeros mean that the nodes donot have a daughter. the values map the indices to
child nodes. so X will mean go to the index model.treemap(X,tree_num*2) to find
the right child node
btw, i think your condition will happen only if the tree are one sided like
only growing left or right.
Original comment by abhirana
on 26 Sep 2012 at 7:39
Thanks. But I tried several dataset ,but the index model.treemap(X,tree_num*2)
are all zeros.And I am quit puzzled about this result.
Original comment by zhangleu...@gmail.com
on 26 Sep 2012 at 8:25
Attachments:
D)
U4LWWP~N8D.jpg`U4LWWP~N8D.jpg)can you send me the model file if possible?
Original comment by abhirana
on 26 Sep 2012 at 2:55
i know the issue
treemap = [model.treemap(:,tree_num*2-1); model.treemap(:,tree_num*2);];
lDau = treemap(1:2:end); lDau = lDau(1:num_nodes);
rDau = treemap(2:2:end); rDau = rDau(1:num_nodes);
two columns of treemap are concatenated and generates lDau and rDau. lDau and
rDau are alternative.
most trees do not occupy nrnodes (max size) and that is the reason why most
times the second column is empty
Original comment by abhirana
on 26 Sep 2012 at 3:01
Oh,thanks for you reply.
I guess now I have a better understanding of thr treemap now.
I have another question, when we use the model to predict the testing data,
is there existing a way to find which node in each tree does the testing data
locate?
Thanks.
Original comment by zhangleu...@gmail.com
on 27 Sep 2012 at 1:39
i guess you are looking for node information
http://code.google.com/p/randomforest-matlab/source/browse/trunk/RF_Class_C/tuto
rial_ClassRF.m#249
http://code.google.com/p/randomforest-matlab/source/browse/trunk/RF_Class_C/clas
sRF_predict.m#26
Original comment by abhirana
on 27 Sep 2012 at 1:44
Hi, when I go through the details. I found that the node is not ntest by ntree
matrix. In my example, if the number of test is 50, the the node matrix is 50
times 1.
Original comment by zhangleu...@gmail.com
on 27 Sep 2012 at 8:14
When I try the follwing code
model = classRF_train(X_trn,Y_trn);
clear test_options
test_options.predict_all = 1;
test_options.proximity = 1;
[Y_hat, votes, prediction_per_tree, proximity_ts] = classRF_predict(X_tst,model,test_options);
Then there is an error whcih says
??? Error using ==> classRF_predict
Too many output arguments.
Original comment by zhangleu...@gmail.com
on 27 Sep 2012 at 8:26
hi zhang
are you using the latest svn source if not sync to the svn source. or use this
download link
http://randomforest-matlab.googlecode.com/issues/attachment?aid=410008000&name=r
f-rev55+-+20+Sep+2012.zip&token=DiBZ0BWzfgmEWFULfd4MDOaKvTo%3A1348784889462 (i
uploaded it in a different issue
http://code.google.com/p/randomforest-matlab/issues/detail?id=41&can=1)
Original comment by abhirana
on 27 Sep 2012 at 10:30
Ok.Thank you for you advice. I have update the package now.
I still have a question.In each node, RF find the best spit among the randomly
selected features. But in the package this is just like a black-nox.
So is there some methods for me to modify the spilting rule in the RF?
Original comment by zhangleu...@gmail.com
on 28 Sep 2012 at 3:08
yeh, RF splits are based on the CART algorithm splits.
nah, the methods are too much imbued that it might be hard to modify the
splitting rule in RF.
take a look into findbestsplit function (reg_Rf.cpp for regression. rfsub.f for
classification) and search for crit
Original comment by abhirana
on 28 Sep 2012 at 6:23
Hi,abhirana,
In the file of tutorial_Proximity_training_test ,how do you calulte the
Proximity between the training sample and test sample?
Do we need to find the node information about the testing and training sample.
And if the located in the same node, then the Proximity between them is added
by 1.
Then we normolize the Proximity matrix.
Is that the way to calcute the Proximity between the test and train sample?
Original comment by zhangleu...@gmail.com
on 28 Sep 2012 at 7:36
give me half a day. i need to fix a bug in the computeproximity routine.
if i remember somewhat, proximity is calculated somewhat as you described
Original comment by abhirana
on 28 Sep 2012 at 7:39
i just added the bug fix in computeproximity routine and its the svn.
if you dont want to redownload the source, just change
line 245 in RF_Class_C\src\rfutils.cpp (computeProximity)
from (inbag[i] > 0) ^ (inbag[j] > 0) to (inbag[i] > 0) || (inbag[j] > 0)
i guess you are correct. computeProximity calculates the proximity matrix
Original comment by abhirana
on 28 Sep 2012 at 7:50
Ok.But I am not sure why
prox: n x n proximity matrix
I guess prox should be a length(Y_tst) Times (length(Y_tst)+length(Y_trn))
where the first length(Y_tst) times length(Y_tst) should be the proximity
between the test samples and the rest be the Proximity between the train and
test.
Original comment by zhangleu...@gmail.com
on 28 Sep 2012 at 8:35
note that there are two cases described in the tutorial file
tutorial_proximity_training_test.m
one where training is done and the testing is not aware of the training
examples. the proximity calculation REQUIRES training example information and
when that is not available will default to proximity of only the test examples
the second example is what you are looking for
pass test examples and labels into classRF_train.. the returned model will have
the proximity information
model2 = classRF_train(X_trn,Y_trn, 2000, 0, extra_options,X_tst,Y_tst);
model2.proximity_tst
do post a snippet of code if you still have issues.
Original comment by abhirana
on 28 Sep 2012 at 8:46
Hi,abhirana,
Is there some method for us to find the margin for each tree?
Original comment by zhangleu...@gmail.com
on 28 Sep 2012 at 10:32
can you define margin?
Original comment by abhirana
on 28 Sep 2012 at 5:55
the margin is befined by breiman .
I guess we can get it from the 'prediction_pre_tree'.
sorry, we can only get it for a collection of trees, not for each tree.
Original comment by zhangleu...@gmail.com
on 29 Sep 2012 at 2:27
Attachments:
when you use prediction_per_tree you will get a nexample x ntree matrix, so you
will get it for individual tree predition for each test example
Original comment by abhirana
on 29 Sep 2012 at 2:29
Hi,abhirana.
If I set the extra_optipns.replace=0.
There are still so many zeros in the model.inbag,which means some samples are still out of bag.Why this happen?
code:load data/twonorm
%modify so that training data is NxD and labels are Nx1, where N=#of
%examples, D=# of features
X = inputs';
Y = outputs;
[N D] =size(X);
%randomly split into 250 examples for training and 50 for testing
randvector = randperm(N);
X_trn = X(randvector(1:250),:);
Y_trn = Y(randvector(1:250));
X_tst = X(randvector(251:end),:);
Y_tst = Y(randvector(251:end));
extra_options.replace = 0 ;
extra_options.keep_inbag = 1; %(Default = 0)
model = classRF_train(X_trn,Y_trn, 100, 4, extra_options);
Original comment by zhangleu...@gmail.com
on 29 Sep 2012 at 7:09
replace will only change the the replacement scheme from with
replacement(default or 1) to without replacement(0). it doesn't have any effect
on the number of out bag examples because thats controlled by the sampsize
variable
if you want to change how many examples you want to sample per tree change the
sampsize variable
Original comment by abhirana
on 29 Sep 2012 at 11:37
Ok.But what is the default value for the sampsize in your code? Seems it does
not mention in the tutorial file.
Original comment by zhangleu...@gmail.com
on 1 Oct 2012 at 7:05
randomforests default: sampling N times with replacement from N training
examples (which are the same as what is done for bagging).
Original comment by abhirana
on 1 Oct 2012 at 5:25
So ,in this case,if replace=0, why there are so many 0s in the inbag?
I guess the 0 in the inbag means this sample is out of bag.But we have to
sample N times without replacement.So ervey sample should in the bag.
Original comment by zhangleu...@gmail.com
on 2 Oct 2012 at 2:09
Note the sampsize default is .632*N when doing without replacement. That
proportion is around the same when doing with replacement. So you at having
same number of out bags both ways
Original comment by abhirana
on 2 Oct 2012 at 3:00
Ok. So when with replacement, we sample N from N. If replace=0, we sample
0.632*N from N without replacement.
I have another question. When we select mtry feature from all the features,
could we assign a weight vector and select the feature according to their
weight? If so, where could I change the code?
I can see there are 'mexRF_train' funtion in your code.However, I could not
find the code for this funtion in the package.
Thanks.
Original comment by zhangleu...@gmail.com
on 2 Oct 2012 at 3:26
you can always change how many examples are being sampled by tweaking sampsize
mexRF_train is compiled from a bunch of files in the src folder. you can find
the list of files being compiled in compile_windows.m. you will have to modify
the c/c++ and maybe fortran code to implement that.
Original comment by abhirana
on 2 Oct 2012 at 3:47
Original issue reported on code.google.com by
zhangleu...@gmail.com
on 26 Sep 2012 at 2:34