tingliu / randomforest-matlab

Automatically exported from code.google.com/p/randomforest-matlab
4 stars 3 forks source link

Hierarchical sampling of data? #57

Open GoogleCodeExporter opened 8 years ago

GoogleCodeExporter commented 8 years ago
I am wondering if it is possible to incorporate hierarchical sampling of the 
data into the random forest. 

Essentially, I have multiple observations acquired from the same subject,which 
means that the out of bag estimates are not necessarily independent. I'm having 
trouble re-calibrating the model using out of bag predictions because of this.

I looked at the stratified sampling, but it does not look to be the same as 
what I'm asking for.

Original issue reported on code.google.com by alistair...@gmail.com on 27 Feb 2013 at 1:35

GoogleCodeExporter commented 8 years ago
i am guessing that would be possible, but some c-code array would need to be 
changed.

i think the best approach would be to say sample and create inbag/outbag 
indices for the trees outside in the matlab and then make the tree sample 
according to that inbag/outbag indices. that way you can tear up the sampling 
away from the c-code.

i am guessing you want the inbag/outbag created as follows: assuming that each 
subject is a sample and then bootstrap sample from the subject array and then 
sample from each subject some samples to create some sort of hierarchical 
sampling. 

anyways, i think it can be doable.  i am not sure if you will be upto coding 
some C-code because i am a bit held up till the end of april so i may not be 
able to code it up before then

https://code.google.com/p/randomforest-matlab/source/browse/trunk/RF_Reg_C/src/r
eg_RF.cpp#386 
https://code.google.com/p/randomforest-matlab/source/browse/trunk/RF_Class_C/src
/classRF.cpp#404

i guess you can put options there use a predefined oob/inbag indices or use the 
existing path if that array is not present.

Original comment by abhirana on 28 Feb 2013 at 5:52

GoogleCodeExporter commented 8 years ago
Yeah, what you said is exactly what I would want to do. Do you know if the 
stratified sampling accepts 0? If so you could make multiple calls to the C 
function and do a sort of hacky version of the sampling like you suggest. Other 
than that I am not sure of a way to modify the sampling from MATLAB, if I've 
missed something let me know cause that is definitely an option.

I am trying to avoid coding C, you may have noticed, I promise it's for good 
reasons ;)

Original comment by alistair...@gmail.com on 28 Feb 2013 at 11:12

GoogleCodeExporter commented 8 years ago
i apologize for my late reply.

sorry it looks like the stratified sampling requires a non-zero value :(
hmm, looks like this will need some c coding.

Original comment by abhirana on 9 Mar 2013 at 7:14