stan-dev / projpred

Projection predictive variable selection
https://mc-stan.org/projpred/
Other
109 stars 25 forks source link

Discrete families with finite support (and possibly other families) #70

Closed fweber144 closed 1 year ago

fweber144 commented 3 years ago

It would be great to have more outcome families supported by projpred. For discrete outcome families with finite support (e.g. the cumulative family in ordinal regression), the projection can be performed via a pseudo-dataset approach:

For these families, equation (12) from

Piironen J, Paasiniemi M, Vehtari A (2020). Projective inference in high-dimensional problems: Prediction and feature selection. Electronic Journal of Statistics, 14(1), 2155–2197. https://doi.org/10.1214/20-EJS1711

simplifies to

formula1

with supp(\tilde{y}) denoting the (discrete and finite) support of the outcome distribution (i.e. the set of possible values for the outcome y).

This simplification corresponds to a weighted maximum-likelihood problem with weights

formula2

and a pseudo (or artificial) dataset constructed as follows:

  1. Take the original dataset. Denote by n the number of rows in this original dataset.
  2. Repeat each row another (|supp(\tilde{y})| - 1) times. The resulting dataset then has n * |supp(\tilde{y})| rows.
  3. Replace the outcome variable (column "response", say) by all possible values for y (i.e. each value from supp(\tilde{y})) such that each row i \in {1, ..., n} from the original dataset occurs together with each possible value for y (in column "response") exactly once.

The weight for the pseudo-dataset row coming from row i \in {1, ..., n} in the original dataset and having outcome value y is then a_i^*(y) as defined above.

Thus, it should be possible to use existing routines for solving this weighted maximum-likelihood problem, like MASS::polr() for the cumulative family (in case of no group-level or smoothing terms). PSIS-LOO-CV also naturally fits into the simplification above (simply introducing the importance weights when averaging over the posterior draws).

Of course, the approach is primarily useful for outcome families (discrete and finite support) which do not constitute an exponential family (since exponential families may be treated the usual way, see Piironen et al., 2020).

EDITS:

avehtari commented 3 years ago

Thanks, this all makes sense, and let's proceed as discussed during our call.

fweber144 commented 1 year ago

The corresponding preprint is out now: https://doi.org/10.48550/arXiv.2301.01660

fweber144 commented 1 year ago

Implemented by PR #322.