rfl-urbaniak / MRbook

0 stars 0 forks source link

revision to higher order probability paper and response to reviewers #96

Open marcellodibello opened 1 month ago

marcellodibello commented 1 month ago

reviewers of the higher-order probability paper raised two key worries:

  1. we can just use first order probabilities, we don't need to go higher order
  2. it'd be good to engage with literature about higher probabilities i will try to address both these concerns

Here are some literature to consider:

marcellodibello commented 1 month ago

very interesting paper by Judea pearl on higher order probabilities, very clear and insightful!

Judea Pearl, Do we need higher order probabilities and, if so, what do they mean?

https://arxiv.org/pdf/1304.2716

Here is what the paper aims to do:

what do people mean when they assign confidence intervals to probabilistic sentences? What empirical and/or procedural information is conveyed by such intervals? how these intervals expand and contract in light of new information? and, should we carry these intervals in AI systems or simply dismiss them as another peculiarity of unfortunate homosapiens. We shall now cast the semantics of second-order probability statements within the framework of classical, first-order probabilistic theory.

Coin example

Pearl starts with an example comparing two coins, one which we have some reason to belief could be biased and another not. So we have P(heads-coin1)=P(heads-coin2)=.5, but the confidence/uncertainty about these two probability assessments is different. The question is, what is the semantics of this second-order uncertainty statement? That is, what is the semantics of "the confidence about P(heads-coin1)=.5 is high" and "the confidence about P(heads-coin2)=.5 is low". This is the question of the paper.

Medical diagnostic example

He also consider another more realistic example:

Consider an example given in [Spiegelhalter, 1986] ''a patient presented to a specialist may have a 10% chance of gastric cancer just from the known incidence in that referral clinic. However, one may be unwilling to make a decision until many further questions were asked, after which it may well be reasonable to perform an endoscopy even on the basis of the same 10% belief, since no further interrogation will substantially alter our belief.''

Key idea

The key idea is that one, first, encodes the knowledge and evidence about an even in a causal graph and uses that graph to calculate (a) first-order probability about the first-order event of interest (conditioning on some other events) but also (b) calculate how that probability changes in light of values of other conditions. Part (b) is what capture second-order uncertainty.

In other words, when a person encodes probabilistic knowledge as a causal model of interacting variables, that person automatically specifies, not merely the marginal and joint distributions of the variables in the system, but also an entire set of future scenarios, describing how these probabilities would vary in response to future eventualities. It is this implicitly encoded dynamics that renders probabilistic statements random events, admitting distributions, intervals, and other confidence measures. (p. 51)

The Bell example

suppose we were told that there is a bell hidden somewhere in the room, which will ring iff the coin turns up head; would that story alter our confidence in P (E 1) = 0.50? It should if the bell's sound B is proclaimed to be a contingency relative to E 1• Yet, despite the fact that the conditional probability P (E 11B) is extremely sensitive to whether B is true or false, most people would agree that the story about the bell has no effect whatsoever on our confidence in the statement: P (E 1) = 0.50. Why? Apparently causal consequences of events do not qualify as contingencies for those events.

This shows that only some events (the causes, not the consequents) should be considered the contingencies that we use to assess how the probability of the event of interest (say heads or gastric cancer) changes.

Pearl's Proposal

Our confidence in the assessment of BEL (E) is measured by the (narrowness of the) distribution of BEL (E I c) as c ranges over all combinations of contingencies, and each combination c is weighed by its current belief BEL (c). (p. 54)

Interestingly, Pearl uses distributions over probabilities to represent this state of higher-order uncertainty., see Fig 1, p. 50 or Fig 4 and 5, pp. 57 and 57. This is very similar to our proposal of using a (higher-order) distribution over parameter values interpreted as probabilities.

Pearl's claim is that the semantics of such higher-order statements---or con confidence statements about first-probability assessments---can be stated in the language of first-order probability.

marcellodibello commented 1 month ago

Objection to Pearl proposal?

Pearl's key claim (se earlier note) is that we can offer a semantics of higher order probability statements that uses first-order probability models, essentially bayesian networks.

What's odd is that the semantics he offers makes assumptions about causality, say assumptions about what "contingencies" can enter into the computations of what affects changes of the first-order probabilities.

So, basically, he is relying on causal assumptions to give a semantics of second-order probability statements. So it not completely clear that the semantics of second order probability he gives simply "emerges" from first-order probability statements. he needs additional (causal) assumptions to make it work.

marcellodibello commented 1 month ago

Miller's paradox

  1. P(q)=v0 (by assumption, for a generic proposition p and a generic probability value v0)

  2. P[q | P(q)=P(not-q)]=1/2 (seem to hold generally if we allow first- and second-order probabilities)

  3. P[q | P(q)=1/2]=1/2 (holds because P(q)=P(not-q) iff P(q)=1/2)

  4. P(not-q)= P[q | P(q)=P(not-q)] (seems to hold generally if we allow first- and second-order probabilities)

  5. P(not-q)=1/2

C. So P(q)=1/2, for any proposition p, which is absurd.

marcellodibello commented 4 weeks ago

started to make revisions to higher-order probabilism paper, so far finished section 4 on higher-order probabilism, now working on section 5 and proper scoring rule

marcellodibello commented 3 weeks ago

revised up to section 4 higher-order probabilism, included, and then stopped because of questions/confusions I have about the claims made in the paper, see issue #99

rfl-urbaniak commented 2 weeks ago

4. P(not-q)= P[q | P(q)=P(not-q)]

Are you sure this is the paradox? 4. seems false. I would buy into

P(not-q|P(q)=P(not-q))= P[q | P(q)=P(not-q)]

but not 4., which pretty much already says that P(q) is 1/2