kapelner / bartMachine

An R-Java Bayesian Additive Regression Trees implementation
MIT License
62 stars 27 forks source link

Negative probabilities in posterior for classification problem #53

Closed schwa021 closed 11 months ago

schwa021 commented 11 months ago

I am running a dichotomous classification problem using bartMachine.

When I check the posterior (example below) using _bart_machine_getposterior I see some negative values.

image

According to the help, the units are probabilities (e.g., not probits).

"_y_hat_posterior_samples The full set of posterior samples of size num_iterations_after_burnin for each observation. For regression, the estimates have the same units as the response. For classification, the estimates are probabilities."

Thus, negative values should be impossible. Any thoughts regarding what's happening here? It's not an isolated case.

kapelner commented 11 months ago

Can you provide a working example with the data?

On Sun, Oct 15, 2023, 22:14 schwa021 @.***> wrote:

I am running a dichotomous classification problem using bartMachine.

When I check the posterior (example below) using bart_machine_get_posterior I see some negative values.

[image: image] https://user-images.githubusercontent.com/25011558/275373786-eb577e28-8860-47c6-a0e4-a4ed32dd0c69.png

According to the help, the units are probabilities (e.g., not probits).

"y_hat_posterior_samples The full set of posterior samples of size num_iterations_after_burn_in for each observation. For regression, the estimates have the same units as the response. For classification, the estimates are probabilities."

Thus, negative values should be impossible. Any thoughts regarding what's happening here? It's not an isolated case.

— Reply to this email directly, view it on GitHub https://github.com/kapelner/bartMachine/issues/53, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAFAV6FOKAGO5HOAOYG5IBLX7SKANANCNFSM6AAAAAA6BOQ4SM . You are receiving this because you are subscribed to this thread.Message ID: @.***>

schwa021 commented 11 months ago

Issue resolved. Sorry for wasting your time.

As I was working on a reprex I found out that my y-variable had been converted to a number (instead of factor) due to me screwing up something with indirection in Dplyr.

Thanks again for the great package. Can't wait for the multinomial models ;)