Closed GoogleCodeExporter closed 9 years ago
hi
the X and Y (for say the diabetes dataset) included in the package represents
the data.
the description for X and Y in the diabetes dataset is explained here
http://www-stat.stanford.edu/~tibs/ftp/lars.pdf (pg 2, table-1)
the goal is to predict the response Y based on the inputs X.
RF is mostly used in a supervised learning setting where multiple features (in
X) are used to predict a single response or target (in Y)
so in your setting, you have to group xtrain, ytrain together in the _train()
functions and look at the performance of the RF algorithm by using only the
xtest in _predict() and compare the results obtained from _predict() with ytest
you can run either classification (using RF_Class_C) or regression (using
RF_Reg_C)
take a look at pg-11
http://www.cs.colorado.edu/~grudic/teaching/CSCI5622_2006/Introduction.pdf
its better if you take a look at an introductory statistics book
i am not responding to your que in issue-25 as its written all here
Original comment by abhirana
on 7 Mar 2012 at 4:15
when i run RF_tutorial.m, it loads data/twonorm.actually it load twonorm.mat
producing two matrix named output and input. from where these values come
from?what does value in output variable signify?is it taken random?
Original comment by abhi4emb...@gmail.com
on 7 Mar 2012 at 8:50
these are the details of the twonorm.mat
http://www.cs.toronto.edu/~delve/data/twonorm/desc.html
the data in twonorm.mat is subsampled with about 300 examples from the twonorm
distribution.
Original comment by abhirana
on 7 Mar 2012 at 8:54
output (class labels/target values, a 1 dimensional vector) = Y
input (matrix from multiple features) = X
Original comment by abhirana
on 7 Mar 2012 at 8:55
i have seem the tar file but could not figure out exactly what's there in
output??
some combination of 1's and -1's but in what pattern?why they are only written
so?any reasons behind or just tried to represent 1-D vector?but why in
combination of 1 and -1?
would it give me wrong result if i put all values as '1' in output matrix......
Original comment by abhi4emb...@gmail.com
on 7 Mar 2012 at 9:38
are you familiar with classification and regression problems where the goal is
to learn a function from data? i think you need to brush that knowledge. i gave
you the link so that you can know what distribution generates twonorm.
in simplest term i can generate a synthetic dataset as follows:
Yhat = (X1 + X2)^2, where X1 and X2 are two features and Y is the output, with
the goal that the classifier can predict for future examples from these
distribution
in classification, i can make a rule saying if Yhat > 2 its class-1 else its
class-2. its no fun learning if all labels are the same. the pattern is not in
Yhat or Y but in X and which the classifier is expected to learn.
in regression i try to learn the rule for predicting Yhat values directly
rather than via labels.
another example would be can you predict the chance of some disease (yes/no -
classes) or amount of cholesterol (continous values) if you are given the
height, weight, age, etc features. the goal is to learn patters from features
like height etc and predict disease/cholesterol for future patients.
Original comment by abhirana
on 7 Mar 2012 at 9:50
Original comment by abhirana
on 31 Mar 2012 at 8:39
Original issue reported on code.google.com by
abhi4emb...@gmail.com
on 7 Mar 2012 at 3:54