kapelner / bartMachine

An R-Java Bayesian Additive Regression Trees implementation
MIT License
62 stars 27 forks source link

Probabilities predicted for which class? #10

Closed larskotthoff closed 8 years ago

larskotthoff commented 8 years ago

When I predict probabilities, I'm getting probabilities for the opposite class of what I'm expecting. Example:

library(bartMachine)
data(Sonar, package = "mlbench")
model = bartMachine(Sonar[-61], Sonar$Class)
classes = predict(model, new_data = Sonar[-61], type = "class")
probs = predict(model, new_data = Sonar[-61], type = "prob")
levels(Sonar$Class)

I'm getting something like

[1] R R R R R R R M R R R R ...
Levels: M R

for classes and for probs

[1] 0.61762749 0.63869063 0.51221708 ...

So the probabilities are for the "R" class, which is the second class in the level set. I would expect probabilities for the first class.

Changing the level set before giving the data to bartMachine doesn't seem to make a difference.

kapelner commented 8 years ago

Very strange. I usually convert to 0/1 before so I've personally nevwr seen this. I will fix. On Mar 11, 2016 10:44 PM, "Lars Kotthoff" notifications@github.com wrote:

When I predict probabilities, I'm getting probabilities for the opposite class of what I'm expecting. Example:

library(bartMachine) data(Sonar, package = "mlbench") model = bartMachine(Sonar[-61], Sonar$Class) classes = predict(model, new_data = Sonar[-61], type = "class") probs = predict(model, new_data = Sonar[-61], type = "prob") levels(Sonar$Class)

I'm getting something like

[1] R R R R R R R M R R R R ... Levels: M R

for classes and for probs

[1] 0.61762749 0.63869063 0.51221708 ...

So the probabilities are for the "R" class, which is the second class in the level set. I would expect probabilities for the first class.

Changing the level set before giving the data to bartMachine doesn't seem to make a difference.

— Reply to this email directly or view it on GitHub https://github.com/kapelner/bartMachine/issues/10.

larskotthoff commented 8 years ago

Thanks! Only noticed this when trying to integrate it into mlr where we test explicitly for switched labels.

kapelner commented 8 years ago

I'm glad you are battle-testing these algorithms. On Mar 12, 2016 8:14 PM, "Lars Kotthoff" notifications@github.com wrote:

Thanks! Only noticed this when trying to integrate it into mlr where we test explicitly for switched labels.

— Reply to this email directly or view it on GitHub https://github.com/kapelner/bartMachine/issues/10#issuecomment-195847000 .

kapelner commented 8 years ago

It's fixed on master. Do you need it to be pushed to CRAN to pass your tests?

On Sat, Mar 12, 2016 at 8:51 PM, Adam Kapelner kapelner@gmail.com wrote:

I'm glad you are battle-testing these algorithms. On Mar 12, 2016 8:14 PM, "Lars Kotthoff" notifications@github.com wrote:

Thanks! Only noticed this when trying to integrate it into mlr where we test explicitly for switched labels.

— Reply to this email directly or view it on GitHub https://github.com/kapelner/bartMachine/issues/10#issuecomment-195847000 .

Adam Kapelner, Ph.D. Assistant Professor of Mathematics Queens College, City University of New York 65-30 Kissena Blvd., Kiely Hall Room 604 Flushing, NY, 11367 M: 516-435-6795 kapelner.com (scholar https://scholar.google.com/citations?user=TzgMmnoAAAAJ|research gate http://www.researchgate.net/profile/Adam_Kapelner2|publons https://publons.com/author/431881/adam-kapelner#profile)

larskotthoff commented 8 years ago

I tested it, but it still doesn't seem to work. How exactly do you control which class the probability is predicted for? We assume that the first factor level determines that, i.e. if you change the order of the factor levels with the same factors you get different probabilities.

Is that how bartMachine works?

kapelner commented 8 years ago

Hey Lars,

I'm getting the following:

library(bartMachine) Loading required package: rJava Loading required package: bartMachineJARs Loading required package: car Loading required package: randomForest randomForest 4.6-12 Type rfNews() to see new features/changes/bug fixes. Loading required package: missForest Loading required package: foreach foreach: simple, scalable parallel programming from Revolution Analytics Use Revolution R for scalability, fault tolerance and more. http://www.revolutionanalytics.com Loading required package: itertools Loading required package: iterators Welcome to bartMachine v1.2.2! You have 0.53GB memory available.

data(Sonar, package = "mlbench") model = bartMachine(Sonar[-61], Sonar$Class) bartMachine initializing with 50 trees... bartMachine vars checked... bartMachine java init... bartMachine factors created... bartMachine before preprocess... bartMachine after preprocess... 61 total features... bartMachine sigsq estimated... bartMachine training data finalized... Now building bartMachine for classification ... evaluating in sample data... Iteration 100/1250 Iteration 200/1250 Iteration 300/1250 Iteration 400/1250 Iteration 500/1250 Iteration 600/1250 Iteration 700/1250 Iteration 800/1250 Iteration 900/1250 Iteration 1000/1250 Iteration 1100/1250 Iteration 1200/1250 done building BART in 1.868 sec

burning and aggregating chains from all threads... done done

classes = predict(model, new_data = Sonar[-61], type = "class") probs = predict(model, new_data = Sonar[-61], type = "prob") levels(Sonar$Class) [1] "M" "R" classes[1:10] [1] R R M R R R R M R R Levels: M R probs[1:10] [1] 0.3347604 0.3596603 0.5267640 0.3083755 0.2885636 0.2246686 0.3675480 0.6449398 0.4446663 0.1412680

Is this what you are seeing? The levels are M and R where M is the first level corresponding to "1" in a logistic regression for instance. Then the probabilities are P(M) which you can see above.

I handle it arbitrarily in the code. Now I'm taking levels(y)[1] to be "1".

On Sun, Mar 13, 2016 at 12:19 AM, Lars Kotthoff notifications@github.com wrote:

I tested it, but it still doesn't seem to work. How exactly do you control which class the probability is predicted for? We assume that the first factor level determines that, i.e. if you change the order of the factor levels with the same factors you get different probabilities.

Is that how bartMachine works?

— Reply to this email directly or view it on GitHub https://github.com/kapelner/bartMachine/issues/10#issuecomment-195880510 .

Adam Kapelner, Ph.D. Assistant Professor of Mathematics Queens College, City University of New York 65-30 Kissena Blvd., Kiely Hall Room 604 Flushing, NY, 11367 M: 516-435-6795 kapelner.com (scholar https://scholar.google.com/citations?user=TzgMmnoAAAAJ|research gate http://www.researchgate.net/profile/Adam_Kapelner2|publons https://publons.com/author/431881/adam-kapelner#profile)

larskotthoff commented 8 years ago

Ok, thanks for checking -- this should work! I'll check the mlr integration again.

larskotthoff commented 8 years ago

Thanks, I've worked out the mlr integration now. We'll merge once you've released to CRAN (mlr-org/mlr#790).