probml / pml-book

"Probabilistic Machine Learning" - a book series by Kevin Murphy
MIT License
4.97k stars 592 forks source link

Comment regarding "complete class" theorem attributed to Wald #210

Closed droy closed 3 years ago

droy commented 3 years ago

Wald never proved the theorem labelled 5.3.1: "Every admissible frequent decision procedure is bayes with respect to some possibly improper prior". As stated, I'm also fairly certain that this theorem has never been proven. It certaintly wasn't shown by Wald, because I believe he may have died before improper priors were really a thing, though limits of Bayes procedures perhaps. The first reference I know for generalized Bayes is [Sacks. "Generalized Bayes solutions in estimation problems." Ann. Math. Statist. 34 751–768, 1963], although Sacks suggests the idea has been around (in the air, I suppose) for a long while. He later refers to [Samuel Karlin. "Admissibility for Estimation with Quadratic Loss." Ann. Math. Statist. 29 (2) 406 - 436, June, 1958. https://doi.org/10.1214/aoms/1177706620], who points to work in 1950/1951 by Lehmann and by Bylth (Lehmann's student) on using limits of Bayes procedures. The latter describes an approach, now known as Blyth's method, for establishing admissibility via limits of Bayes procedures. This approach is not known to be complete, even if it is sound.

Note that a theorem that "Assuming XYZ, admissible => Bayes" is not technically a complete class theorem, because, in some problems, the class of all admissible procedures in not a complete class. Admittedly, these are strange decision problems usually, though not always. Wald did prove theorems of the form "admissible => Bayes" (as well as theorems that also established that the admissible procedures were complete) but they were for problems with finite parameter spaces or under conditions like compactness of the parameter space and continuity of the risk function. Of course, these were seminal theorems.

James Berger is really the one to be credited for proving "admissible => generalized Bayes" (and some complete class problems), but these results were only for exponential family models (+ some regularity). Indeed, in general, all existing theorems of the form "admissible => [some form of Bayes]" were, until recently, laden with regularity conditions of some sort.

Haosui Duanmu and I showed recently that "a procedure is extended admissible <==> its Bayes risk is within an infinitesimal of the Bayes optimal risk, with respect to a prior that may assign infinitesimal mass". This will appear in Annals of Stats.

   https://www.e-publications.org/ims/submission/AOS/user/submissionFile/31623?confirm=381d7eb7

Our characterization, in contrast to all previous work, has no regularity conditions and makes no strong assumptions on the problem (such as boundedness of the loss). So, in conclusion, Bayes'ness is slightly weaker than admissibility, since extended admissibility says that there is no procedure that has uniformly better risk (eps better everywhere, for some eps > 0).

In summary, the result credit'd to Wald is indeed very similar to results by Wald, but it is missing very strong assumptions that were only lifted 70 years later. Berger is due some credit here, but even his result has assumptions. If you want to handle any problem, you may need a prior assigning infinitesimal mass (our work).

murphyk commented 3 years ago

I'll add a citation to your paper. How do you suggest I reword sec 5.3.3 without going into all these details?

Also I noticed the following claim in the paper below: "The only admissible decision rules are Bayesian (Wald, 1992). This means that if you have a decision rule that is not Bayesian, you can improve the statistical quality of its decisions with a Bayesian alternative (Cox and Hinkley, 1979)"

I. Osband, Z. Wen, M. Asghari, M. Ibrahimi, X. Lu, and B. Van Roy, “Epistemic Neural Networks,” in NIPS, 2021 [Online]. Available: https://github.com/deepmind/enn

droy commented 3 years ago

In some problems, you can show that the generalized Bayes rules are a complete class. Exponential families on finite dimensional vector spaces with nice losses are an example. I’d have to look at Berger’s paper to give you a more precise statement. My feeling is that this is the sort of theorem you would want to quote. Let me find one for you. And then you could mention that, in general, the result does not hold but there is a very general connection that holds without any conditions, but it involves a notion of Bayes optimality allow infinitesimals probabilities.

Regarding the statement by Cox and Hinkley: let me dig this up and look at the context. It is certainly false if you consider only proper Bayes rules (i.e., those derived from proper priors). I suspect they are using Bayesian to refer to a broader class. Indeed, since priors are really in the "mind" of the statistician only, I felt that our solutios using infinitesimals was rather palatable, since we're already used to improper priors and other deviations. If I had to guess the class to which they are referring, it might be generalized Bayes (so, rules derived from the "formal" (i.e., mechanical) application of Bayes rule to an improper density) and then they would have just dropped a bunch of hypothesis. But Cox and Hinkley were probably looking at exponential families and so that may have been the context in which one must understand their statement.

What's the time frame you need me to sort out this statements in order to meet your own deadlines?

Dan

On Wed, Oct 13, 2021 at 11:40 PM Kevin P Murphy @.***> wrote:

I'll add a citation to your paper. How do you suggest I reword sec 5.3.3 without going into all these details?

Also I noticed the following claim in the paper below: "The only admissible decision rules are Bayesian (Wald, 1992). This means that if you have a decision rule that is not Bayesian, you can improve the statistical quality of its decisions with a Bayesian alternative (Cox and Hinkley, 1979)"

I. Osband, Z. Wen, M. Asghari, M. Ibrahimi, X. Lu, and B. Van Roy, “Epistemic Neural Networks,” in NIPS, 2021 [Online]. Available: https://github.com/deepmind/enn

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/probml/pml-book/issues/210#issuecomment-942917396, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAEESXEJ53EKGVP62TTSSM3UGZGK7ANCNFSM5DL7A6JQ . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

pglpm commented 3 years ago

Besides these important results and theorems, I'd be glad if the book emphasized that "admissible" must not be taken in the common sense of the word, but as a technical term only and that an "admissible" decision rule can be silly, and an "inadmissible" one can be cogent and appropriate. Stressing this is important for the audience of the book, who's not limited to statisticians, and so may make dangerous psychological associations (as with the horrible term "significant" in statistics). The book gives a short warning about the first point on page 186, right after Theorem 5.3.1, but not about the converse. I wish both points were emphasized much more.

See also the interesting discussion about this in Jaynes, around § 13.7–13.10, especially pp. 409, 415; and the examples around § 17.5.

droy commented 3 years ago

I’ll push back: Our Annals paper shows that nonstandard Bayes if and only if extended admissible. And so Jaynes, whether or not he likes to think about admissibility, is always extended admissible because he is Bayes (easy to prove direction) and every extended admissible is nonstandard Bayes.

On Sun, Oct 17, 2021 at 12:55 PM Luca @.***> wrote:

Besides these important results and theorems, I'd be glad if the book emphasized that "admissible" must not be taken in the common sense of the word, but as a technical term only and that an "admissible" decision rule can be silly, and an "inadmissible" one can be cogent and appropriate. Stressing this is important for the audience of the book, who's not limited to statisticians, and so may make dangerous psychological associations (as with the horrible term "significant" in statistics). The book gives a short warning about this on page 186, right after Theorem 5.3.1, but I wish the point were emphasized much more.

See also the interesting discussion about this in Jaynes https://doi.org/10.1017/CBO9780511790423, around § 13.7–13.10, especially pp. 409, 415; and the examples around § 17.5.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/probml/pml-book/issues/210#issuecomment-945158386, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAEESXF436QWWX5BYLOL4MDUHL5YDANCNFSM5DL7A6JQ . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

pglpm commented 3 years ago

Hi Daniel, I'm not questioning the validity of your proofs. The problem is with this kind of technical terms and particular readerships such as Kevin's. Does "extended admissible" include what was called "inadmissible" before? Does "extended admissible" now mean 'admissible' in the common sense of the term? And does "nonstandard Bayes" mean a Bayes that's more open to criticisms, or not agreed upon, or idiosyncratic? Imagine a clinician who's using machine-learning methods for cancer diagnosis and is studying from Kelvin's book to have a better understanding about them. Unfortunately terms like the above are not psychologically neutral, unlike "diffeomorphic" or "C*-algebra" or "sequent calculus". Not rarely one has to end up explaining that "subjective" in "subjective Bayesian" doesn't mean "whimsical". So I'd be glad if Kevin warned his readers against attaching common-sense meanings to familiar-sounding terms that only have a technical meaning; and, when the common-sense meaning actually applies, if he explained why.

murphyk commented 3 years ago

I rewrote sec 5.3.3 as shown below. I did not get into all the tecnhicalities that Dan raises, but just cited his paper instead. And I emphasized some of the points Luca (pglpm) raised. I also removed the label "theorem" when I give the example of the "silly" estimator, since I am trying to eschew too much formality. The modified text will appear in the next online version, which I will post very soon; however, this won't appear in print until the second print run, which presumably won't be until mid-late 2022. (The first print run will correspond to the 2021-08-27 version.)

Screen Shot 2021-10-23 at 9 38 09 PM
droy commented 3 years ago

It's a nice compromise. You can also show that the constant estimator is admissible because it is the unique Bayes estimator for the prior that generates on \theta_0. This makes it seem a little less pathological: an individual using this estimator is certain of the value.

On Sun, Oct 24, 2021 at 12:43 AM Kevin P Murphy @.***> wrote:

I rewrote sec 5.3.3 as shown below. I did not get into all the tecnhicalities that Dan raises, but just cited his paper instead. And I emphasized some of the points Luca (pglpm) raised. I also removed the label "theorem" when I give the example of the "silly" estimator, since I am trying to eschew too much formality. The modified text will appear in the next online version, which I will post very soon; however, this won't appear in print until the second print run, which presumably won't be until mid-late 2022. (The first print run will correspond to the 2021-08-27 version.)

[image: Screen Shot 2021-10-23 at 9 38 09 PM] https://user-images.githubusercontent.com/4632336/138581110-18504b76-27be-4d42-a84b-4457299e0055.png

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/probml/pml-book/issues/210#issuecomment-950259288, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAEESXCGMB5WNN6PLB5GCJLUIOFFJANCNFSM5DL7A6JQ . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

pglpm commented 3 years ago

Thank you Kevin! If possible, feel free also to add references to Jaynes's book (§§ 13.7–10, 17.5), which gives a long discussion about this.

Couple of small typos: "admissable" on the 3rd and next-to-last lines of the snipped section.

droy commented 3 years ago

Jaynes argues his points in deceptive ways and produces rabid Bayesians who don’t understand frequentism or Bayesianism. It is not a healthy book for impressionable minds learning the material.

On Fri, Oct 29, 2021 at 3:36 AM Luca @.***> wrote:

Thank you Kevin! If possible, feel free also to add references to Jaynes's book (§§ 13.7–10, 17.5), which gives a long discussion about this.

Couple of small typos: "admissable" on the 3rd and next-to-last lines of the snipped section.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/probml/pml-book/issues/210#issuecomment-954500753, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAEESXDJXDA3KQ6VYCJNTWTUJJMJPANCNFSM5DL7A6JQ . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

pglpm commented 3 years ago

It's subjective. I still find it the best book on probability theory out there - based on principles, not recipes, and written in plain English, it tries to make you understand rather than memorize a bunch of technical terms. It shows signs of the time in which was written, when Bayesians still had to battle to publish; but I still find it unsurpassed.

Again: this is subjective, and Kevin has the last word of course. Mine was only a suggestion.

droy commented 3 years ago

It certainly inspires the reader to think they can approach every problem as a pure, unwavering subjectivist. Then you realize you can't take some integral and you have to make compromises and then you're off the reservation with no tools. Frequentism provides another set of tools that one can use to handle one's own lack of understanding of the problem.

On Fri, Oct 29, 2021 at 5:26 AM Luca @.***> wrote:

It's subjective. I still find it the best book on probability theory out there - based on principles, not recipes, and written in plain English, it tries to make you understand rather than memorize a bunch of technical terms. It shows signs of the time in which was written, when Bayesians still had to battle to publish; but I still find it unsurpassed.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/probml/pml-book/issues/210#issuecomment-954591408, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAEESXFVFGU6YFDKFM3ABA3UJJZFHANCNFSM5DL7A6JQ . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

pglpm commented 3 years ago

"It certainly inspires the reader" → "it may inspire some readers"; I don't understand why or how you can make such universal sweeping statements. I was and am a reader, and am aware that I must often make compromises and approximations. Still I find it good to try or have a glimpse of the principled approach first. A great problem when the book was written was that most books in statistics only offered the compromises without explaining which principles were compromised to start with. I personally find the difference between Jaynes's and traditional books similar to the difference in explaining the motion of the planets using Newton's equations vs using epicycles and deferents. The latter can and did give very precise results (often adding epicycles upon epicycles), but don't give you the larger picture. Solutions to the former must often be found in approximate form; no news there.

But sorry Kevin and Daniel, not my intention to transform these comments into a debate, so I'll shut up here :)

droy commented 3 years ago

The analogy is broken because you don't solve problems with misspecified models using approximations to Bayes rule.

On Fri, Oct 29, 2021 at 6:00 AM Luca @.***> wrote:

"It certainly inspires the reader" → "it may inspire some readers"; I don't understand why or how you can make such universal sweeping statements. I was and am a reader, and am aware that I must often make compromises and approximations. Still I find it good to try or have a glimpse of the principled approach first. A great problem when the book was written was that most books in statistics only offered the compromises without explaining which principles were compromised to start with. I personally find the difference between Jaynes's and traditional books similar to the difference in explaining the motion of the planets using Newton's equations vs using epicycles and deferents. The latter can and did give very precise results (often adding epicycles upon epicycles), but don't give you the larger picture. Solutions to the former must often be found in approximate form; no news there.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/probml/pml-book/issues/210#issuecomment-954613610, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAEESXCZ3Q6MC7MRH3LSJGTUJJ5GHANCNFSM5DL7A6JQ . Triage notifications on the go with GitHub Mobile for iOS https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675 or Android https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub.

murphyk commented 3 years ago

who knew that i would ignite a bayesian flame war inside of github! Anyway, I fixed the typo "admissible" but otherwise have left my (new) text unchanged. It cites Dan and's paper and Jaynes's book, so readers can read both and then make up their own minds :)

fortuin commented 1 year ago

I just stumbled upon this discussion when googling things about admissibility and I found it quite entertaining and insightful to read, so I just wanted to say thanks for having it in the public domain :-)