vdorie / dbarts

Discrete Bayesian Additive Regression Trees Sampler
56 stars 20 forks source link

Predict binary outcome? #12

Open ignacio82 opened 5 years ago

ignacio82 commented 5 years ago

I'm trying to figure out how to predict a binary outcome. This is the code i have right now:

library(dplyr)
library(dbarts)
library(C50)
library(tictoc)
data(churn)
tic("with one thread")
test <- bart2(formula = as.numeric(churn) - 1 ~ ., data = churnTrain, verbose = TRUE, 
              n.threads = 1L)
toc()

yhat <- as.data.frame(test$fit$predict(x.test = churnTest[,-20]) )

Is this the right way of doing this? I was expecting that yhat would only take the values 0 or 1, but instead:

> max(yhat)
[1] 9.016082
> min(yhat)
[1] -12.15876
vdorie commented 5 years ago

Predictions for binary y are the probit scale, and need to be transformed back into probabilities using pnorm. However, I just noticed a bug with predict for binary outcomes that I've fixed on the master branch. predict is a relatively new feature and the traditional way of obtaining estimates for test observations is to supply those points at the time of fitting. This also means that the trees don't need to be kept which reduces the memory cost significantly.

test <- bart2(formula = as.numeric(churn) - 1 ~ ., data = churnTrain, test = churnTrain[,-20],
              verbose = TRUE, n.threads = 1L)
# to transform to probabilities:
#   pnorm(test$yhat.train)
# posterior means:
#   apply(pnorm(test$yhat.train), 3, mean)
# test against predictions using the predict function:
mean(abs(test$yhat.train - predict(test, churnTrain[,-20])))
ignacio82 commented 5 years ago

Thanks! Is there a built-in way to calculate AUC, sensitivity, and specificity?

vdorie commented 5 years ago

Sorry, not at this time. It looks like the pROC library can do that pretty easily.