stan-dev / projpred

Projection predictive variable selection
https://mc-stan.org/projpred/
Other
110 stars 26 forks source link

Add R2 as performance statistic #483

Open fweber144 opened 7 months ago

fweber144 commented 7 months ago

As suggested by @avehtari, it would be good to have $R^2$ as a performance statistic in projpred. This could be called stats = "R2" (and stat = "R2" for suggest_size()), for example. According to @avehtari, we should go for LOO - $R^2$.

There is also related code at https://github.com/stan-dev/projpred/blob/bec6258478ce9a04e92d50a0aa6628c23878dab5/R/summary_funs.R#L170-L187 (Note that * (n / (n - 1) can be omitted because it cancels out.) In those lines, bayesboot::rudirichlet() is used. According to @avehtari, the SE could also be calculated without a Dirichlet approach, using the formula from https://github.com/stan-dev/loo/pull/205#issuecomment-1316683962.

fweber144 commented 7 months ago

@AlejandroCatalina: Is line looR2[looR2 < -1] <- -1 supposed to read looR2[looR2 < 0] <- 0?

fweber144 commented 7 months ago

@avehtari: The SE formula provided in https://github.com/stan-dev/loo/pull/205#issuecomment-1316683962 refers to LOO - $R^2$. I guess it cannot be applied directly to K-fold CV, no CV (i.e., test dataset = training dataset), or a hold-out test dataset. Do you know of similar formulas for those cases?

avehtari commented 7 months ago

Is line looR2[looR2 < -1] <- -1 supposed to read looR2[looR2 < 0] <- 0?

The first one is intentional.

The SE formula provided in https://github.com/stan-dev/loo/pull/205#issuecomment-1316683962 refers to LOO - . I guess it cannot be applied directly to K-fold CV

Can be used with K-fold-CV and pointwise evaluation.

no CV (i.e., test dataset = training dataset)

We have used Bayesian-R2 for that as it has some benefits in that case, but the same formula could be used, too

or a hold-out test dataset

Can be used

avehtari commented 3 months ago

Implemented by https://github.com/stan-dev/projpred/pull/496/commits/0ed83914eff49bba52c6c065e7059d0b300b9643