Discrete families with finite support (and possibly other families)

fweber144 commented 3 years ago

It would be great to have more outcome families supported by projpred. For discrete outcome families with finite support (e.g. the cumulative family in ordinal regression), the projection can be performed via a pseudo-dataset approach:

For these families, equation (12) from

Piironen J, Paasiniemi M, Vehtari A (2020). Projective inference in high-dimensional problems: Prediction and feature selection. Electronic Journal of Statistics, 14(1), 2155–2197. https://doi.org/10.1214/20-EJS1711

simplifies to

formula1

with supp(\tilde{y}) denoting the (discrete and finite) support of the outcome distribution (i.e. the set of possible values for the outcome y).

This simplification corresponds to a weighted maximum-likelihood problem with weights

formula2

and a pseudo (or artificial) dataset constructed as follows:

Take the original dataset. Denote by n the number of rows in this original dataset.
Repeat each row another (|supp(\tilde{y})| - 1) times. The resulting dataset then has n * |supp(\tilde{y})| rows.
Replace the outcome variable (column "response", say) by all possible values for y (i.e. each value from supp(\tilde{y})) such that each row i \in {1, ..., n} from the original dataset occurs together with each possible value for y (in column "response") exactly once.

The weight for the pseudo-dataset row coming from row i \in {1, ..., n} in the original dataset and having outcome value y is then a_i^*(y) as defined above.

Thus, it should be possible to use existing routines for solving this weighted maximum-likelihood problem, like MASS::polr() for the cumulative family (in case of no group-level or smoothing terms). PSIS-LOO-CV also naturally fits into the simplification above (simply introducing the importance weights when averaging over the posterior draws).

Of course, the approach is primarily useful for outcome families (discrete and finite support) which do not constitute an exponential family (since exponential families may be treated the usual way, see Piironen et al., 2020).

EDITS:

Minor wording improvements in the text above.
The approach might also be used as an approximation for discrete families with infinite support if support areas with near-zero probability are ignored.
In principle, a similar pseudo-data approach could also be used for any outcome family (including continuous ones and also discrete ones with infinite support) with one of the following two approaches:
- Draw repeatedly (say T times) from the outcome distribution given the predictor vector xi (with i \in {1, ..., n}) and a draw \theta^s (with s \in I_c) from the reference model's posterior. Within cluster I_c, this would give a pseudo dataset with |I_c| T n rows. That gets big very quickly, so the practical feasibility of this Monte-Carlo-pseudo-data approach might be limited. But perhaps one could split up the |I_c| T draws for y_i (with i \in {1, ..., n}) into G groups (the groups being as homogeneous as possible within them and as heterogeneous as possible between them) and then use an existing weighted maximum-likelihood routine for the condensed pseudo dataset with n * G rows. The grouping could be performed e.g. by clustering or by using quantiles.
- Discretize the outcome distribution (preferably using quantiles) given the predictor vector xi (with i \in {1, ..., n}) and a draw \theta^s (with s \in I_c) from the reference model's posterior. Within cluster I_c, this would give a pseudo dataset with |I_c| G n rows (with G denoting the number of groups in the discretization of the outcome distribution) for which an existing weighted maximum-likelihood routine could be used (with weights given by the probability masses corresponding to the groups). The drawback is that |I_c| G * n gets big very quickly, too.

avehtari commented 3 years ago

Thanks, this all makes sense, and let's proceed as discussed during our call.

fweber144 commented 1 year ago

The corresponding preprint is out now: https://doi.org/10.48550/arXiv.2301.01660

fweber144 commented 1 year ago

Implemented by PR #322.

stan-dev / projpred

Discrete families with finite support (and possibly other families) #70