Closed avehtari closed 6 months ago
Good idea! Will add that to the next release.
Actually, I think brms supports this already via the argument joint
that I recently added following our LGO-CV discussion. Can you check if that is what you mean?
Ah, missed that. I checked the web page doc and issues, but not the doc for the github version. Does that work if K is less than the number of groups?
Ah, you want to have a joint by group not (necessarily) by K?
Yes, as in the case of a large number of groups, the computation can take a long time
Would this allow to do leave-one-patient out CV if I get this right... which would be super useful.
Does this need extra considerations if the different groups have different number of observations? In that case one would want to account for that, right?
Also - the way stratification is done wrt to other covariates may need to be reconsidered if one has a grouping factor with unequal sizes. The covariates would in many cases only vary by group as these are mostly baseline covariates (in the context of a randomised clinical trial).
Would this allow to do leave-one-patient out CV if I get this right..
Yes. I was testing with Nabiximols case study, and I run out of memory with save_fits=TRUE
with 2 models having 105 folds each, but I'd like to make predictions, too.
Does this need extra considerations if the different groups have different number of observations? In that case one would want to account for that, right?
That should be an option.
For example the following
kfold2b <- kfold(fit_betabinomial2b, group="id", folds=kfold_split_grouped(20, droplevels(cu_df_b$id)))
or maybe even
kfold2b <- kfold(fit_betabinomial2b, group="id", K=20)
would do 20-fold-CV, but would return 105 joint elpds, The default would be to return the simple joint, so that if there are no predictive dependencies then the sum of joint elpds is the same as sum of pointwise elpds.
No works on github. Below is an example:
fit1 <- brm(count ~ zAge + zBase * Trt + (1|patient),
data = epilepsy, family = poisson())
kfold(fit1, folds = "group", joint = "group", group = "patient")
Currently,
kfold()
computes pointwise log score (elpd_loo). For leave-one-group-out cross-validation, it would be useful to have an option to compute joint log score for each group. Instead of returning pointwise elpd for each observation, this would return groupwise joint elpd (eljpd) for each group. Given those, the rest of the summary and model comparison functions should work as they are now, although some docs and messages could be refined.