stan-dev / projpred

Projection predictive variable selection
https://mc-stan.org/projpred/
Other
110 stars 26 forks source link

testing removal of except function #424

Closed l-gorman closed 1 year ago

l-gorman commented 1 year ago

Removal of Except Argument in CV_Varsel Simplify2 Array

Attempt to fix #423. Very simple fix (unless I am missing something fundamental), I am removing the except=NULL argument, for the simplify2array function, called in the kfold_varsel function. I ran through with the example from the cv_varsel documentation.

dat_gauss <- data.frame(y = df_gaussian$y, df_gaussian$x)
  fit <- rstanarm::stan_glm(
    y ~ X1 + X2 + X3 + X4 + X5, family = gaussian(), data = dat_gauss,
    QR = TRUE, chains = 2, iter = 1000, refresh = 0, seed = 9876
  )
  # Run cv_varsel() (with small values for `K`, `nterms_max`, `nclusters`,
  # and `nclusters_pred`, but only for the sake of speed in this example;
  # this is not recommended in general):

  ref_model <- get_refmodel(fit)

  cvvs <- cv_varsel(object=ref_model,  
  method ="kfold",
  cv_method = if (!inherits(object, "datafit")) "LOO" else "kfold",
  ndraws = NULL,
  nclusters = 5,
  ndraws_pred = 10,
  nclusters_pred = NULL,
  refit_prj = !inherits(object, "datafit"),
  nterms_max = 3,
  penalty = NULL,
  verbose = TRUE,
  nloo = NULL,
  K =2,
  lambda_min_ratio = 1e-5,
  nlambda = 150,
  thresh = 1e-6,
  regul = 1e-4,
  validate_search = TRUE,
  seed = 5555,
  search_terms = NULL,
  parallel = getOption("projpred.prll_cv", FALSE))

I compared the outputs of these two calls:

sub_foldwise <- simplify2array(lapply(res_cv, "[[", "summaries_sub"),
                                 higher = FALSE)

sub_foldwise <- simplify2array(lapply(res_cv, "[[", "summaries_sub"),
                                 higher = FALSE, except=NULL)

And they appear to be the same (although I am no expert so I may be missing something. I am currently running this an an example dataset of my own to also see if this works.

I know this is such a minor change, but I hope it helps a little! Sorry again for all of the hassle with my questions :)

fweber144 commented 1 year ago

Thank you for creating this PR. Unfortunately, except cannot simply be removed because it is needed for the case where a forward search is run with nterms_max = 0 (which should not occur in practice, but still, this is a supported case, so we need to support it further). I would be inclined to set the minimum required R version to 4.2.0, but if you need to have an official projpred version which supports R < 4.2.0, then I can try to re-use the relevant code which existed before commit e11edc75e330be205a6a28b619eaeb4f11b7a2f7.

l-gorman commented 1 year ago

I understand that we cannot include this, see full response in #423