stan-dev / projpred

Projection predictive variable selection
https://mc-stan.org/projpred/
Other
110 stars 25 forks source link

Augmented-data projection (`augmat` and `augvec` objects): Replace attribute `nobs_orig` by `ndiscrete` #473

Closed fweber144 closed 11 months ago

fweber144 commented 11 months ago

This replaces the former attribute nobs_orig of augmat and augvec objects by a new attribute called ndiscrete, giving the number of (possibly latent) response categories ($C$) instead of the number of observations ($N$).

The reason is that subsetting the rows of an augmented-rows matrix (or the elements of an augmented-length vector) is allowed in terms of the observations (individuals), but not in terms of the (possibly latent) response categories. So $C$ should always stay the same, in contrast to $N$.

Note that this subsetting convention (only observations, not categories) is only an inofficial one; there is no code preventing us from subsetting any rows/elements, even across the (possibly latent) response categories, because functions like str() do not adhere to that subsetting convention (this is also the reason why previously, the global option projpred.additional_checks was used to activate related checks only in the unit tests).

I'm sorry that storing $N$ was a bad design choice from my side in PR #322. I guess the reasons why I chose $N$ instead of $C$ back then were (i) I thought that the switch between latent space and response space might be a problem for storing $C$ and (ii) I did not think of the problems when subsetting an augmented-rows matrix (augmat objects) or an augmented-length vector (augvec objects) with $N$ being stored instead of $C$ (such subsetting—in particular in a fashion so that $N$ changes—is used only very rarely in projpred; subsampled PSIS-LOO CV is an example, see #433 and #434).