I felt bad ever since reading section 8.5.3. I understand the two points it makes about why interpreting the p value as the "probability the null value is false" is wrong. However, given that the intuition is so strong to see it that way, it seems like you could be more helpful here than what is here at present, which is mostly just a kind of scolding and dire admonition.
Now, I'm not sure if you'll agree with the main example he gives on that page, but I found the tree diagram image and the associated example quite helpful in fleshing out my understanding of the relationship between the p value and the probability that the test is giving a meaningful result. It seems to me that the analysis he gives there is frequentist, but it still lets us reason about the probability that our test is giving a meaningful result. That is, we are not asking the probability of whether the null hypothesis of our test is true, but whether, in the universe of all tests (e.g., for treatments for depression) in the world similar to the one we are running, and given an assumption about the overall success rate of such treatments, plus an assumption about the power of these tests, we can say how likely it is that if we get a significant result it's actually false positive. All of this reasoning is subtle (at least in my mind) so I'm not sure if I'm saying everything correctly. But it seems to me that giving an example/analysis like this would help people understand much better the difference between the p value and the chance that they are actually see a true positive.
It might also be useful to show the values for some other assumptions about treatment success rates, p values, and test powers, as I've done in the following spreadsheet, which helps give an intuition for how these false positive rates vary based on those assumptions.
I felt bad ever since reading section 8.5.3. I understand the two points it makes about why interpreting the p value as the "probability the null value is false" is wrong. However, given that the intuition is so strong to see it that way, it seems like you could be more helpful here than what is here at present, which is mostly just a kind of scolding and dire admonition.
In particular, I found this: http://www.dcscience.net/2014/03/24/on-the-hazards-of-significance-testing-part-2-the-false-discovery-rate-or-how-not-to-make-a-fool-of-yourself-with-p-values/
Now, I'm not sure if you'll agree with the main example he gives on that page, but I found the tree diagram image and the associated example quite helpful in fleshing out my understanding of the relationship between the p value and the probability that the test is giving a meaningful result. It seems to me that the analysis he gives there is frequentist, but it still lets us reason about the probability that our test is giving a meaningful result. That is, we are not asking the probability of whether the null hypothesis of our test is true, but whether, in the universe of all tests (e.g., for treatments for depression) in the world similar to the one we are running, and given an assumption about the overall success rate of such treatments, plus an assumption about the power of these tests, we can say how likely it is that if we get a significant result it's actually false positive. All of this reasoning is subtle (at least in my mind) so I'm not sure if I'm saying everything correctly. But it seems to me that giving an example/analysis like this would help people understand much better the difference between the p value and the chance that they are actually see a true positive.
It might also be useful to show the values for some other assumptions about treatment success rates, p values, and test powers, as I've done in the following spreadsheet, which helps give an intuition for how these false positive rates vary based on those assumptions.
https://docs.google.com/spreadsheets/d/1Gxl1jObj-Jtrshl0I3HeUQrP3EbUbXemyuSlIy4bO5s/edit#gid=0