vasishth / bayescogsci

Draft of book entitled An Introduction to Bayesian Data Analysis for Cognitive Science by Nicenboim, Schad, Vasishth
100 stars 27 forks source link

Understanding the Bayes Factor more #62

Open yuhanczhang opened 1 week ago

yuhanczhang commented 1 week ago

Hi there! I am reading Chapter 15 and am stuck on understanding the binomial example. To be more specific, for Figure 15.1, I understand that the parameter theta is the parameter of the binomial distribution. It itself has the beta distribution. Figure 15.1 shows that it is a Beta distribution for theta. But what is the meaning of y? It is hard to believe that this y is the number of success, right? Could y be probability? If so, why can y exceed 1?

This question might be naive. Thank you for your patience!

yuhanczhang commented 1 week ago

Adding another question here: In Ch 15, it seems that the conclusion is that both BF and the posterior model probabilities (posterior odds with consideration of prior probabilities of the parameters) can be used as a model comparison method. Can we use either of them in real cases?

vasishth commented 1 week ago

Hi there! I am reading Chapter 15 and am stuck on understanding the binomial example. To be more specific, for Figure 15.1, I understand that the parameter theta is the parameter of the binomial distribution. It itself has the beta distribution. Figure 15.1 shows that it is a Beta distribution for theta. But what is the meaning of y? It is hard to believe that this y is the number of success, right? Could y be probability? If so, why can y exceed 1?

Make sure you really understand chapters 1 and 2. I agree that it is confusing to use the variable y, but the definition of Bayes factor is often laid out in a general way, with y being the data, whatever that means in a particular situation.

Here, y is the number of successes. You can imagine going the experiment over and over again 100 times, with rbinom:

> rbinom(n=100,size=20,prob=0.5)
  [1] 11  9 14 10  8 13 12 12 14 10  6  9  6  5  8 12  5 11 14 13 13  8 10  9 11
 [26]  9 12  9 14 10 12  9 10  8  8 10 10 12  8  8  7  9  7  8  8 13 12  9 11 11
 [51]  7  9  8 13  9 12  9 13 11 12  8  9 11 10 10 11 10  8  9 11 11  8  9 10  8
 [76]  8 11  8 13  7  9 11  7  9 13 10  8 10  5 10 10  9 12 10 11 11  9  8  9  7

The randomly generated data here is the repeated generation of y, the number of successes, in 100 experiments.

How can it be the probability? That is the parameter $\theta$. I think the foundational ideas in ch 1 and 2 need to be clear first, so re-read those.

vasishth commented 1 week ago

Adding another question here: In Ch 15, it seems that the conclusion is that both BF and the posterior model probabilities (posterior odds with consideration of prior probabilities of the parameters) can be used as a model comparison method. Can we use either of them in real cases?

I am not sure what you are asking here exactly. Maybe work out a simple example and then show me what you are looking for here. Are you coming to SMLP 2024? We can discuss it there if yes.

yuhanczhang commented 1 week ago
Screenshot 2024-09-02 at 11 21 48 AM

For the second question, I am referring to these paragraphs, where the text says "However, the Bayes factor alone cannot tell us which one of the models is the most probable." The implicit logic here is that we need to look at the posterior odds as well. I was asking a clarification question -- basically, in practice, shall someone also provide the posterior odds? Thanks!

yuhanczhang commented 1 week ago

Hi there! I am reading Chapter 15 and am stuck on understanding the binomial example. To be more specific, for Figure 15.1, I understand that the parameter theta is the parameter of the binomial distribution. It itself has the beta distribution. Figure 15.1 shows that it is a Beta distribution for theta. But what is the meaning of y? It is hard to believe that this y is the number of success, right? Could y be probability? If so, why can y exceed 1?

Make sure you really understand chapters 1 and 2. I agree that it is confusing to use the variable y, but the definition of Bayes factor is often laid out in a general way, with y being the data, whatever that means in a particular situation.

Here, y is the number of successes. You can imagine going the experiment over and over again 100 times, with rbinom:

> rbinom(n=100,size=20,prob=0.5)
  [1] 11  9 14 10  8 13 12 12 14 10  6  9  6  5  8 12  5 11 14 13 13  8 10  9 11
 [26]  9 12  9 14 10 12  9 10  8  8 10 10 12  8  8  7  9  7  8  8 13 12  9 11 11
 [51]  7  9  8 13  9 12  9 13 11 12  8  9 11 10 10 11 10  8  9 11 11  8  9 10  8
 [76]  8 11  8 13  7  9 11  7  9 13 10  8 10  5 10 10  9 12 10 11 11  9  8  9  7

The randomly generated data here is the repeated generation of y, the number of successes, in 100 experiments.

How can it be the probability? That is the parameter θ . I think the foundational ideas in ch 1 and 2 need to be clear first, so re-read those.

Thank you so much! I will read Ch1 and 2 closely.

yuhanczhang commented 1 week ago

Also unfortunately I wasn't able to make it to SMLP 2024. Wish I could be there!

vasishth commented 1 week ago

For the second question, I am referring to these paragraphs, where the text says "However, the Bayes factor alone cannot tell us which one of the models is the most probable." The implicit logic here is that we need to look at the posterior odds as well. I was asking a clarification question -- basically, in practice, shall someone also provide the posterior odds? Thanks!

Thanks for clarifying your question.

The text in the book needs some editing. I will just rewrite it to make it clearer but need to discuss my proposed changes with my co-authors first, so it may take some time to make the edits.

My reading of this para is that the intended message is that one cannot just take two models, compute the Bayes factor, and simply report the Bayes factor; the Bayes factor will depend on the prior specifications. So, when you report a Bayes factor, always report the prior on the target parameter, because the BF will change depending on the prior.

For examples of how we report BFs, see

https://direct.mit.edu/nol/article/4/2/221/114205/Understanding-the-Effects-of-Constraint-and

https://www.sciencedirect.com/science/article/pii/S0028393220300981?via%3Dihub

and many of our other papers on my home page.

But to answer your question, no, you do not need to report posterior odds. Just report the BF under different priors (due to BF's prior sensitivity). The interpretation of the BF under the different priors is: assuming an a priori range of plausible values like Normal(0,0.1) in a reading study (i.e., assuming that the effect size is a priori small, the BF is such and such value, and assuming a prior of Normal(0,1), i.e., assuming that the effect could be as large as -2 or +2 on the log ms scale, the BF is such and such. Please do read the Schad et al paper too, it has a lot more than is in the book. Ch 15 alone is not enough to read for BFs. Schad has another paper in Psych Methods on aggregation that is also important; I also wrote a related paper in Computational Brain and Behavior (see my home page) on using BFs.