Open jscamac opened 4 years ago
Woops, yeah this is almost certainly because I didn't include the subset
argument as an option in the stan_surv()
signature.
It could probably be added. It might not be too hard, but I wouldn't know without taking a closer look. If anyone wants to have a go at adding that functionality and submitting a PR, feel free (I don't have much spare time to work on this atm).
It probably wasn't caught earlier since I probably didn't write tests for testing stan_surv()
with that post-estimation functionality.
Sorry this feature isn't available! And I'm not sure there is an easy quick fix -- unless there is a hack available where you can subset the data manually outside of the modelling function (e.g. using a loop). @jgabry or @avehtari might be able to advise on that, perhaps.
We might be able to change the kfold code to not require the subset argument. It's not essential, but we didn't anticipate a modeling function without the subset argument so the code currently expects it. In the meantime,
unless there is a hack available where you can subset the data manually outside of the modelling function (e.g. using a loop).
this should be possible (by changing the data argument each time through the loop). The kfold method in rstanarm is just there for convenience. It doesn't do anything special that couldn't be done manually. It looks like there's a lot of code required
https://github.com/stan-dev/rstanarm/blob/master/R/loo-kfold.R
but most of it is just to get it working in as many cases as possible. If using that file as a reference, the only crucial parts are changing the data argument each time, evalauting log_lik(), and then calculating elpd.
Here's a quick workaround
f <- eval(parse(
text=sub(
'fit_k_call$subset <- eval(fit_k_call$subset)',
'fit_k_call$subset <- if(is.stansurv(x)) NULL else eval(fit_k_call$subset)',
deparse(rstanarm:::kfold.stanreg), fixed=TRUE)
), envir=environment(rstanarm:::kfold.stanreg))
assignInNamespace('kfold.stanreg', value=f, ns='rstanarm')
The relevant line is: https://github.com/stan-dev/rstanarm/blob/99b2ae0840b7e15bbd340718da9b41b4dee3ee64/R/loo-kfold.R#L171
Thanks @johnbaums!
Summary:
There appears to be a problem with using
kfold
on stan_surv models. Error returned is:Error: passing unknown arguments: subset.
Description:
I'm fitting multi-level survival models with time-varying using rstanarm::stan_surv models and want to use kfold cross validation where groups of individuals are heldout (individuals are repeatedly surveyed overtime). I tried aggregating the log likelihoods as done in the time-varying coefficient example in the Arxiv paper (https://arxiv.org/pdf/2002.09633.pdf), however I get pretty poor Pareto k diagnostic values. So I thought using kfold might be a better alternative, where I group individuals into folds. However, when I do this I get the following error
Error: passing unknown arguments: subset.
NOTE I also get this error if I don't stratify/group the data into folds, and if I exclude the random effect for individual. Any advice welcomed!Reproducible Steps:
Taking the reproducible example presented in the Arxiv paper:
RStanARM Version:
R Version:
Operating System:
OSX 10.14.6