Closed larskotthoff closed 8 years ago
Very strange. I usually convert to 0/1 before so I've personally nevwr seen this. I will fix. On Mar 11, 2016 10:44 PM, "Lars Kotthoff" notifications@github.com wrote:
When I predict probabilities, I'm getting probabilities for the opposite class of what I'm expecting. Example:
library(bartMachine) data(Sonar, package = "mlbench") model = bartMachine(Sonar[-61], Sonar$Class) classes = predict(model, new_data = Sonar[-61], type = "class") probs = predict(model, new_data = Sonar[-61], type = "prob") levels(Sonar$Class)
I'm getting something like
[1] R R R R R R R M R R R R ... Levels: M R
for classes and for probs
[1] 0.61762749 0.63869063 0.51221708 ...
So the probabilities are for the "R" class, which is the second class in the level set. I would expect probabilities for the first class.
Changing the level set before giving the data to bartMachine doesn't seem to make a difference.
— Reply to this email directly or view it on GitHub https://github.com/kapelner/bartMachine/issues/10.
Thanks! Only noticed this when trying to integrate it into mlr where we test explicitly for switched labels.
I'm glad you are battle-testing these algorithms. On Mar 12, 2016 8:14 PM, "Lars Kotthoff" notifications@github.com wrote:
Thanks! Only noticed this when trying to integrate it into mlr where we test explicitly for switched labels.
— Reply to this email directly or view it on GitHub https://github.com/kapelner/bartMachine/issues/10#issuecomment-195847000 .
It's fixed on master. Do you need it to be pushed to CRAN to pass your tests?
On Sat, Mar 12, 2016 at 8:51 PM, Adam Kapelner kapelner@gmail.com wrote:
I'm glad you are battle-testing these algorithms. On Mar 12, 2016 8:14 PM, "Lars Kotthoff" notifications@github.com wrote:
Thanks! Only noticed this when trying to integrate it into mlr where we test explicitly for switched labels.
— Reply to this email directly or view it on GitHub https://github.com/kapelner/bartMachine/issues/10#issuecomment-195847000 .
Adam Kapelner, Ph.D. Assistant Professor of Mathematics Queens College, City University of New York 65-30 Kissena Blvd., Kiely Hall Room 604 Flushing, NY, 11367 M: 516-435-6795 kapelner.com (scholar https://scholar.google.com/citations?user=TzgMmnoAAAAJ|research gate http://www.researchgate.net/profile/Adam_Kapelner2|publons https://publons.com/author/431881/adam-kapelner#profile)
I tested it, but it still doesn't seem to work. How exactly do you control which class the probability is predicted for? We assume that the first factor level determines that, i.e. if you change the order of the factor levels with the same factors you get different probabilities.
Is that how bartMachine works?
Hey Lars,
I'm getting the following:
library(bartMachine) Loading required package: rJava Loading required package: bartMachineJARs Loading required package: car Loading required package: randomForest randomForest 4.6-12 Type rfNews() to see new features/changes/bug fixes. Loading required package: missForest Loading required package: foreach foreach: simple, scalable parallel programming from Revolution Analytics Use Revolution R for scalability, fault tolerance and more. http://www.revolutionanalytics.com Loading required package: itertools Loading required package: iterators Welcome to bartMachine v1.2.2! You have 0.53GB memory available.
data(Sonar, package = "mlbench") model = bartMachine(Sonar[-61], Sonar$Class) bartMachine initializing with 50 trees... bartMachine vars checked... bartMachine java init... bartMachine factors created... bartMachine before preprocess... bartMachine after preprocess... 61 total features... bartMachine sigsq estimated... bartMachine training data finalized... Now building bartMachine for classification ... evaluating in sample data... Iteration 100/1250 Iteration 200/1250 Iteration 300/1250 Iteration 400/1250 Iteration 500/1250 Iteration 600/1250 Iteration 700/1250 Iteration 800/1250 Iteration 900/1250 Iteration 1000/1250 Iteration 1100/1250 Iteration 1200/1250 done building BART in 1.868 sec
burning and aggregating chains from all threads... done done
classes = predict(model, new_data = Sonar[-61], type = "class") probs = predict(model, new_data = Sonar[-61], type = "prob") levels(Sonar$Class) [1] "M" "R" classes[1:10] [1] R R M R R R R M R R Levels: M R probs[1:10] [1] 0.3347604 0.3596603 0.5267640 0.3083755 0.2885636 0.2246686 0.3675480 0.6449398 0.4446663 0.1412680
Is this what you are seeing? The levels are M and R where M is the first level corresponding to "1" in a logistic regression for instance. Then the probabilities are P(M) which you can see above.
I handle it arbitrarily in the code. Now I'm taking levels(y)[1] to be "1".
On Sun, Mar 13, 2016 at 12:19 AM, Lars Kotthoff notifications@github.com wrote:
I tested it, but it still doesn't seem to work. How exactly do you control which class the probability is predicted for? We assume that the first factor level determines that, i.e. if you change the order of the factor levels with the same factors you get different probabilities.
Is that how bartMachine works?
— Reply to this email directly or view it on GitHub https://github.com/kapelner/bartMachine/issues/10#issuecomment-195880510 .
Adam Kapelner, Ph.D. Assistant Professor of Mathematics Queens College, City University of New York 65-30 Kissena Blvd., Kiely Hall Room 604 Flushing, NY, 11367 M: 516-435-6795 kapelner.com (scholar https://scholar.google.com/citations?user=TzgMmnoAAAAJ|research gate http://www.researchgate.net/profile/Adam_Kapelner2|publons https://publons.com/author/431881/adam-kapelner#profile)
Ok, thanks for checking -- this should work! I'll check the mlr integration again.
Thanks, I've worked out the mlr integration now. We'll merge once you've released to CRAN (mlr-org/mlr#790).
When I predict probabilities, I'm getting probabilities for the opposite class of what I'm expecting. Example:
I'm getting something like
for
classes
and forprobs
So the probabilities are for the "R" class, which is the second class in the level set. I would expect probabilities for the first class.
Changing the level set before giving the data to bartMachine doesn't seem to make a difference.