stan-dev / projpred

Projection predictive variable selection
https://mc-stan.org/projpred/
Other
110 stars 26 forks source link

New functions for investigating predictor rankings #406

Closed fweber144 closed 1 year ago

fweber144 commented 1 year ago

This PR starts to address #289.

In particular, this adds the new functions ranking() and props(), making output element pct_solution_terms_cv of vsel objects obsolete (but requiring the addition of output element solution_terms_cv to vsel objects). This also adds a new plot() method for visualizing the output of props() (more precisely, two plot() methods are added, but plot.vsel() is only a shortcut for first applying ranking.vsel() and then plot.ranking()). For the user-facing changes, see the changes in NEWS.md.

Quick illustration of the new functions:

library(projpred) # Use devtools::load_all() if installation is not desired.

dat_gauss <- data.frame(y = df_gaussian$y, df_gaussian$x)
fit <- rstanarm::stan_glm(
  y ~ X1 + X2 + X3 + X4 + X5, family = gaussian(), data = dat_gauss,
  QR = TRUE, chains = 2, iter = 1000, refresh = 0, seed = 9876
)
cvvs <- cv_varsel(fit, cv_method = "kfold", nclusters_pred = 20, seed = 5555)

# Extract predictor rankings:
rk <- ranking(cvvs)

# Compute ranking proportions:
( pr_rk <- props(rk) )
    predictor
size X1 X5 X3  X2  X4
   1  1  0  0 0.0 0.0
   2  0  1  0 0.0 0.0
   3  0  0  1 0.0 0.0
   4  0  0  0 0.6 0.4
   5  0  0  0 0.4 0.6
attr(,"class")
[1] "props"
# Visualize the ranking proportions:
plot(pr_rk)

Rplot-01

# For longer predictor names:
plot(pr_rk, text_angle = 45)

Rplot-02

# Compute cumulated ranking proportions:
( pr_rk_cumul <- props(rk, cumulate = TRUE) )
     predictor
size  X1 X5 X3  X2  X4
  <=1  1  0  0 0.0 0.0
  <=2  1  1  0 0.0 0.0
  <=3  1  1  1 0.0 0.0
  <=4  1  1  1 0.6 0.4
  <=5  1  1  1 1.0 1.0
attr(,"class")
[1] "cumulprops" "props" 
# Visualize the cumulated ranking proportions:
plot(pr_rk_cumul)

Rplot-03

# For longer predictor names:
plot(pr_rk_cumul, text_angle = 45)

Rplot-04

Dummy ranking proportions for illustrating the color gradient:

nterms_dummy <- 10L
pr_dummy <- matrix(
  seq(0, 1, length.out = nterms_dummy^2),
  nrow = nterms_dummy, ncol = nterms_dummy,
  dimnames = list("size" = as.character(seq_len(nterms_dummy)),
                  "predictor" = paste0("x", seq_len(nterms_dummy)))
)
class(pr_dummy) <- "props"
plot(pr_dummy)

Rplot

As mentioned above, this PR is only a first step towards resolving #289 completely. In the future, I will:

avehtari commented 1 year ago

Why props and not proportions?

Otherwise, the illustration looks good

fweber144 commented 1 year ago

Why props and not proportions?

proportions() was also my first idea, but it already exists in base R (https://stat.ethz.ch/R-manual/R-devel/library/base/html/proportions.html) and unfortunately, it's not a generic. Since prop.table() is an alias for proportions() in base R, I thought it would make sense to abbreviate proportions() by props().

fweber144 commented 1 year ago

Why props and not proportions?

proportions() was also my first idea, but it already exists in base R (https://stat.ethz.ch/R-manual/R-devel/library/base/html/proportions.html) and unfortunately, it's not a generic. Since prop.table() is an alias for proportions() in base R, I thought it would make sense to abbreviate proportions() by props().

We now agreed to rename props() to cv_proportions() in an upcoming PR.