Generative vs discriminative classifiers (§ 9.4): more than just a question of computational advantages or simplicity

Version: 3 August.

I'd be glad if § 9.4 on generative vs discriminative classifiers discussed also the important question of inferential robustness in the choice of one or the other. Roughly speaking the point is as follows. Given two (possibly multidimensional) quantities X and Y, either of the conditional frequencies f(X|Y) and f(Y|X) that we observe in a training/validation/test data sample may or may not be generally sensitive to how data are obtained, and therefore may not or may be suitable for generalization.

Putting this in terms of "populations": it may happen that the conditional frequency f(Y|X) in the training/validation/test data is very different from the one F(Y|X) of the full population, owing to the way the data were obtained or sampled; whereas f(X|Y) ≈ F(X|Y). In this case we should in principle use inferences made (via de Finetti's theorem) from f(X|Y) to predict Y given a new X, via Bayes's theorem ("generative" approach), and not use f(Y|X) ("discriminative" approach).

This kind of robustness of f(X|Y) and f(Y|X) often depends on the nature of the quantities X and Y. For example there could be a physical deterministic relation X = h(Y) which is universally affected by a specific kind of noise; but, h not being injective, the inverse relation Y = h^-1(X) could be very different depending on the particular domain or data source of Y. In this case we would be safer using f(X|Y) from our data, because we can expect it to apply to new data samples, unlike f(Y|X).

This situation can be even more complex, of course, and is also connected with the existence of confounding quantities. There is a brilliant paper about these matters by Lindley & Novick: The Role of Exchangeability in Inference, which explains things better than I'm managing to do. It nicely adopts the point of view of exchangeability.

The machine-learning literature rarely seems to discuss the question of the robustness and generalizability of X|Y vs Y|X, which should instead be a very important factor in choosing generative vs discriminative approaches.

probml / pml-book

Generative vs discriminative classifiers (§ 9.4): more than just a question of computational advantages or simplicity #201