Open HaloCollider opened 1 year ago
From the help file:
Types of forests
There is no need to set the type of forest as the package automagically determines the underlying random forest requested from the type of outcome and the formula supplied. There are several possible scenarios:
Regression forests for continuous outcomes.
Classification forests for factor outcomes.
Therefore, in order for the function to recognize the problem as being classification, the outcome has to be coded as a factor. Something like:
mydata$myoutcome <- factor(mydata$myoutcome)
On a side note, a regression tree with 0/1 binary values (under mean-squared error splitting, the default) is equivalent to fitting a two-class classification tree (under Gini index). So actually it doesn't really matter, although I recommend converting the outcome to a factor (as above) as the output will include values that are normally output only for classification (like misclassification error and so forth)
Thanks for your explanation. This worked. I think it's the little knowledge I had about R that caused the problem. We are using the cross-entropy splitting rule for different models so there are some differences between the MSE and this one. Now the RF model works fine. Again, I am very grateful for your help!
hello. I'm encountering an issue with my analysis involving a binary outcome variable (0, 1), which I've set as a factor. Despite this, I'm receiving an error stating that the variable is not recognized as a factor. Could you please help me understand why this error occurs and how I can resolve it? Thank you in advance for your assistance.
data$d_w_c_r <- factor(data$d_w_c_r)
n<- rfsrc(d_w_c_r ~ ., data = data ) Error in parseFormula(formula, data, ytry) : the y-outcome must be either real or a factor.
In fact, it does not work as numeric as well:
data$d_w_c_r <- as.numeric(data$d_w_c_r) n<- rfsrc(d_w_c_r ~ ., data = data) Error in parseFormula(formula, data, ytry) : the y-outcome must be either real or a factor.
Can you show the output from running the command print(summary(data$d_w_c_r))
sure:
is.factor(data$d_w_c_r) [1] TRUE print(summary(data$d_w_c_r)) 0 1 462 258
make sure that data
is a data.frame
data <- data.frame(data)
thank you very much, it worked.
I've been working on a dataset where y is a bool variable (so as for other models we give a parameter: family = binominal(link = probit)) but for randomForestSRC it returns a regression type, no matter how I change the datatype (bool or double or int). I checked the documentation and it shows that randomForestSRC will automagically recognize the type, but in this particular case there is no way for me to do a manual correction. Besides, I also didn't find the way to do so in the traditional randomForest package for R. Given that I am a rookie into R I really spent much time on this and hopefully you may provide a solution for this or point out the mistake I've made. Thank you so much.