vasishth / bayescogsci

Draft of book entitled An Introduction to Bayesian Data Analysis for Cognitive Science by Nicenboim, Schad, Vasishth
100 stars 27 forks source link

Generating quantities based on validation fold during cross-validation (16.6.1) - cmdstanr issue #23

Closed fusaroli closed 2 years ago

fusaroli commented 2 years ago

As I was adjusting the example of cross-validation in Stan (based on cmdstanr), I kept getting errors due to cmdstanr not accepting the new data for generated quantities, see detailed description here: https://discourse.mc-stan.org/t/generated-quantities-returns-error-mismatch-between-model-and-fitted-parameters-csv-file/17869/14

Not sure how cmdstanr proof you want your handbook to be, but it'd have saved me a lot of time if there was a footnote or similar indicating that with some backends the use of new data requires a change to the underlying model, as explained here: https://mc-stan.org/docs/2_29/stan-users-guide/prediction-forecasting-and-backcasting.html

bnicenboim commented 2 years ago

Thanks, I'll add a footnote! But the book won't be cmdstanr proof for sure given that cmdstanr is in active development and keeps changing...

bnicenboim commented 2 years ago

I don't see any footnote here https://mc-stan.org/docs/2_29/stan-users-guide/prediction-forecasting-and-backcasting.html btw.

fusaroli commented 2 years ago

no footnote in the stan user guide :-) I meant a footnote in your handbook with a link to that (and maybe the explanation that doing this in cmdstanr will require changes to the model and feeding both training and validation data at the same time, see link).

bnicenboim commented 2 years ago

sorry, I don't see anything that refers to cmdstanr or to changes in the model in https://mc-stan.org/docs/2_29/stan-users-guide/prediction-forecasting-and-backcasting.html

what exactly does it say and where?

fusaroli commented 2 years ago

If you try to run the equivalent in cmdstanr of

gq_ho <- gqs(pupil_stanmodel,]() draws = as.matrix(fit_train),]() data = ls_pupil_ho)]()

you'll get a error: mismatch between model and fitted parameters, since cmdstan expects the parameters (e.g. n of trials, or n of participants) not to change. See: https://discourse.mc-stan.org/t/generated-quantities-returns-error-mismatch-between-model-and-fitted-parameters-csv-file/17869/15 There seems to be no workaround that in terms of fitting a model on new data.

The solution is to follow the instructions in the stan user and create a model that already includes data definitions for the validation data, and generates the log lik for the validation data in the generated quantities. Then you feed both training and validation data at the same time (specified as different variables) and you get the log_lik of the validation out of that.

bnicenboim commented 2 years ago

ok, I understood. That's a really annoying behavior of cmdstan!

bnicenboim commented 2 years ago

footnote added