Closed GoogleCodeExporter closed 9 years ago
Hello,
i think i put in categories somewhere (via ncat and cat variables) but at that
point i didn't have time/and a dataset to test out both the r version (i am
somewhat r-challenged) and the matlab version. if you have a simple
categorical/numerical mixed dataset can you send it out to me? i can try in
matlab and r; if not that's ok, let me see if i can get one from somewhere.
the issue with strings is that they are anyways converted to categorical
integers within the r code; and i don't know if matlab even supports mixing a
integer, string into a single matrix (for me to process within the training
code);
so if you can do some kind of preprocessing (like converting into categorical
integers) before sending it to the rf training code than maybe that works out?
Original comment by abhirana
on 24 Aug 2012 at 6:00
Thanks for the quick response!
Yeah, combining strings and integers in one matrix is not possible. After a
bit more reading, it seems the best bet would probably be to construct a
dataset array (http://www.mathworks.com/help/toolbox/stats/bqziht7-1.html)
which is similar to R's data tables. They can hold cells, categorical,
ordinal, and numeric columns, and columns can be accessed by name or by index.
This seems like it would be the most flexible, but they are relatively new
(R2007a) and require the Statistics Toolbox so I have never seen them used.
Anyway, back to random forests: yes it wold be simple to convert to integer
codes for categorical variables. I just need to be sure that they are being
treated as categorical instead of continuous so that the order of the coding
doesn't bias the splits. I didn't see anything in the tutorial about
categorical variables, but if there's already a way to do it that's great,
could you explain?
If not, I don't have my dataset yet, but the hospital dataset in the statistics
toolbox has mixed data I think, as does census income from the UCI repository.
I haven't looked at these too much, but I hope they help. I looked at
TreeBagger again, and it has an option to enter a logical array to identify
categorical variables, but I would prefer to use your package as I have read it
is considerably faster.
Thanks for the help!
Original comment by jmccra...@gmail.com
on 24 Aug 2012 at 6:14
Hey
i just added new code into the svn (both classification/regression) and i think
categorical data is now considered within code.
how do i know its being considered? shorter and more accurate trees are being
created.
just make sure that the categorical data values get a unique number (a unique
integer should suffice) for each categories they belong to.
the example code is at the end of the tutorial files ( i converted existing
datasets into categorical data). its basically telling what features are
categorical via an option, extra_options.categorical_feature = 1xD vector with
mapping of what features to consider as categorical
do tell if you run into any issues.
yeh, i guess i will skip the mixed matrix till it is available in base matlab.
Original comment by abhirana
on 26 Aug 2012 at 12:02
Awesome, thanks! It's great to see such a quick update.
Original comment by jmccra...@gmail.com
on 27 Aug 2012 at 3:43
So I finally got my dataset and want to run the random forest, but I'm not
seeing the example in the tutorial. Did you upload the changes?
Original comment by jmccra...@gmail.com
on 20 Sep 2012 at 7:27
oh its at the end of the tutorial file
http://code.google.com/p/randomforest-matlab/source/browse/trunk/RF_Class_C/tuto
rial_ClassRF.m#256
Original comment by abhirana
on 20 Sep 2012 at 7:54
Ah, I see. I had just redownloaded the precompiled .zip from the download
link. What do I need to update from the files in the source tab?
Original comment by jmccra...@gmail.com
on 20 Sep 2012 at 9:15
actually the svn version is somewhat ahead of the precompiled version in the
download link.
attached file is an extract
Original comment by abhirana
on 20 Sep 2012 at 9:24
Attachments:
Awesome, thank you. I really appreciate all the work you've put in here. I'll
check back in once I've tested it out.
Original comment by jmccra...@gmail.com
on 20 Sep 2012 at 9:44
Looks good, the forests seem to be working as expected. Thank's for all your
work!
Original comment by jmccra...@gmail.com
on 1 Oct 2012 at 9:03
[deleted comment]
Hello,
Could you please provide me with the pre-compiled version of the code shared
above.
I tried a lot but failed to generate mex file.
As my compilation is giving various error in classRF.cpp code.
Original comment by Shalini1...@iiitd.ac.in
on 24 Jun 2013 at 6:18
attached is the latest pre-compiled version of the code
Original comment by abhirana
on 25 Jun 2013 at 3:13
Attachments:
Ah, thanks a lot !
Original comment by Shalini1...@iiitd.ac.in
on 25 Jun 2013 at 4:54
Original issue reported on code.google.com by
jmccra...@gmail.com
on 23 Aug 2012 at 7:06