Fixed models / data files for cmdstan

PhilClemson commented 4 years ago

This basically fixes the majority of models that were either broken or had missing / incorrect syntax for the R dump data files used by cmdstan. They can be checked using the script found in the performance tests repository (https://github.com/stan-dev/performance-tests-cmdstan), e.g.:

python ./performance-tests-cmdstan/runPerformanceTests.py directories "performance-tests-cmdstan/example-models" --overwrite-gold --method=sample --num-samples=1000

Here is a list of the updated models:

misc/ecology/occupancy/occupancy misc/ecology/mark-recapture/mark-recapture misc/ecology/mark-recapture/cjs-K misc/ecology/mark-recapture/mark-recapture-2 misc/ecology/mark-recapture/mark-recapture-3 misc/ecology/mark-recapture/cjs misc/linear-regression_regression misc/linear-regression_regression_std misc/garch_arch1 misc/garch_koyck misc/garch_garch1_1 misc/hier_multivariate_hier_multivariate misc/moving-avg/arma11 misc/moving-avg/arma11_alt misc/moving-avg/maQ misc/moving-avg/stochastic-volatility misc/moving-avg/ma2 misc/moving-avg/stochastic-volatility-optimized misc/gaussian-process/gp-predict-analytic misc/gaussian-process/gp-sim-multi misc/gaussian-process/gp-fit-ARD misc/gaussian-process/gp-fit-latent misc/dlm/fx/equicorr misc/dlm/fx/factor misc/sur/sur/imrpoper misc/sur/sur misc/hmm/hmm misc/hmm/hmm-analytic misc/hmm/hmm-semisup misc/hmm/hmm-sufficient misc/nnmf/nnmf misc/nnmf/nnmf/vec knitr/simplest-regression/fake-data knitr/pool-binary-trials/pool knitr/pool-binary-trials/no-pool knitr/pool-binary-trials/hier knitr/pool-binary-trials/hier-logit-centered knitr/pool-binary-trials/hier-logit knitr/chapter2/worldcup/fixed knitr/chapter2/worldcup/first/try knitr/chapter2/worldcup/first/try/noprior knitr/chapter2/worldcup/with/replication knitr/chapter2/golf1 knitr/chapter2/worldcup/no/sqrt knitr/car-iar-poisson/pois/re knitr/car-iar-poisson/pois knitr/car-iar-poisson/bym2/islands knitr/car-iar-poisson/bym2/offset/only knitr/car-iar-poisson/pois/icar knitr/car-iar-poisson/bym2 knitr/car-iar-poisson/simple/iar knitr/chapter1/fake-data knitr/golf/golf1 knitr/world-cup/worldcup/fixed knitr/world-cup/worldcup/first/try knitr/world-cup/worldcup/first/try/noprior knitr/world-cup/worldcup/with/replication knitr/world-cup/worldcup/no/sqrt ARM/Ch.25/earnings2 ARM/Ch.25/earnings ARM/Ch.25/earnings/pt2 ARM/Ch.25/earnings/pt1 education/tutorial/twopl/reg/dif education/tutorial/twopl/reg/noncentered education/tutorial/twopl/reg/centered education/tutorial/twopl/twopl education/hierarchical/2pl/hierarchical/2pl education/dina/independent/dina/independent education/dina/independent/dina/nostructure BPA/Ch.04/GLMM1 BPA/Ch.04/GLMM2 BPA/Ch.13/bluebug BPA/Ch.13/owls/ms1 BPA/Ch.13/owls/ms2 BPA/Ch.13/Dynocc2 BPA/Ch.10/js/ms BPA/Ch.12/binmix/cov BPA/Ch.12/binmix BPA/Ch.12/Nmix0 BPA/Ch.12/Nmix2 BPA/Ch.12/Nmix1 BPA/Ch.07/cjs/trap BPA/Ch.07/cjs/temp/corr

mitzimorris commented 4 years ago

where did you get the data from? I realize that in the past there have been attempts to add data - cf https://github.com/stan-dev/example-models/issues/151 - these were done in a brute-force way. After the PR associated with that issue went in, I cleaned up part of it by removing generated data from case studies where there wasn't any data.

it's important that the data used to test the model should be generated by the model or the process which the model is trying to approximate. using just the specified constraints on data and parameters to generate test is not enough - there is often domain knowledge which is not specified, e.g., for the icar models, the input data describes an areal map, and there are constraints on the relations between the map edges which aren't directly expressed in the data declarations.

for testing purposes, I would imagine that we're interested in testing well-specified models which are fitting data generated from the model and/or which is somehow pathological. this requires that all test data be annotated explaining what kind of data it is and what kind of behavoir we expect when fitting the model to the data.

mitzimorris commented 4 years ago

hi @PhilClemson - please allow me to clarify:

I think it would be better to curate the models used for testing, not make sure that we can use every single one of them. so I don't think this PR is a good idea.

bob-carpenter commented 4 years ago

I also think a curated set of test models makes much more sense for testing than all the stuff that's accumulated in our example-models repo.

Having said that, I would like to see something done to clean up example-models, which means:

fixing models that we want to keep
removing junk

In the first category are models there for pedagogical purposes, not for testing purposes, like the ones in directory BPA (from the Bayesian Population Analysis book). It'd be nice to at least clean those up. Pretty much everything in misc is just junk I threw up over the years for the manual or just as an example for someone. That can all be eliminated.

How did you generate new data for things like my hierarchical binary pooling example in knitr/pool-binary-trials?

mitzimorris commented 4 years ago

@PhilClemson - w/r/t to the icar-car-poisson models, I don't think your correction of the stan code is correct. did you test it on an areal map data that contained islands/singletons?

PhilClemson commented 4 years ago

where did you get the data from? I realize that in the past there have been attempts to add data - cf #151 - these were done in a brute-force way. After the PR associated with that issue went in, I cleaned up part of it by removing generated data from case studies where there wasn't any data.

it's important that the data used to test the model should be generated by the model or the process which the model is trying to approximate. using just the specified constraints on data and parameters to generate test is not enough - there is often domain knowledge which is not specified, e.g., for the icar models, the input data describes an areal map, and there are constraints on the relations between the map edges which aren't directly expressed in the data declarations.

for testing purposes, I would imagine that we're interested in testing well-specified models which are fitting data generated from the model and/or which is somehow pathological. this requires that all test data be annotated explaining what kind of data it is and what kind of behavoir we expect when fitting the model to the data.

Hi @mitzimorris, thanks for taking a look through this.

Most of the data came from the .R and documentation found in the same folder as the stan files. For the BPA models I downloaded the data from the links provided. My main task was to standardise the models so that they could all be run using cmdstan. Fortunately in most cases it was just a case of generating the data.R file from existing code / data.

There were a few cases where we had to generate "compatible" data but not "curated" data. The knitr models are a good example of this.

hi @PhilClemson - please allow me to clarify:

I think it would be better to curate the models used for testing, not make sure that we can use every single one of them. so I don't think this PR is a good idea.

I understand, we just thought we would share these changes in the hope that they could be applied in general. We've found that having a large selection of models / data is useful, even if just for identifying bugs. It would definitely not be appropriate to judge performance statistics based on individual models, although looking at performance over a large population of models might still be useful.

I also think a curated set of test models makes much more sense for testing than all the stuff that's accumulated in our example-models repo.

Having said that, I would like to see something done to clean up example-models, which means:
* fixing models that we want to keep

* removing junk
In the first category are models there for pedagogical purposes, not for testing purposes, like the ones in directory BPA (from the Bayesian Population Analysis book). It'd be nice to at least clean those up. Pretty much everything in misc is just junk I threw up over the years for the manual or just as an example for someone. That can all be eliminated.

How did you generate new data for things like my hierarchical binary pooling example in knitr/pool-binary-trials?

That sounds like a good idea, although in that case maybe the junk models like those in misc could become a separate repository? I'd have to check with @alecksphillips on those specific examples, but as I've said above I imagine it will be compatible rather than curated data.

@PhilClemson - w/r/t to the icar-car-poisson models, I don't think your correction of the stan code is correct. did you test it on an areal map data that contained islands/singletons?

For these I generated the data from the corresponding .R files. I remember for bym2_islands.stan I was getting errors which seemed to be an indexing issue (which I fixed) but I might be wrong. I believe I used the nycTracts10 data.

alecksphillips commented 4 years ago

That sounds like a good idea, although in that case maybe the junk models like those in misc could become a separate repository? I'd have to check with @alecksphillips on those specific examples, but as I've said above I imagine it will be compatible rather than curated data.

Hi all, Just to add, the bits of data I contributed I generated using the R markdown workbooks that were in those directories.

mitzimorris commented 4 years ago

For these I generated the data from the corresponding .R files. I remember for bym2_islands.stan I was getting errors which seemed to be an indexing issue (which I fixed) but I might be wrong. I believe I used the nycTracts10 data.

please revert all changes to the car-icar-poisson models and delete the data files you added. I will try to add appropriate datasets myself.

the bym2_islands.stan is correctly coded - it requires input where the areal map contains at least 2 disconnected subcomponents, whereas the data you're using is an areal map which is fully connected. the bym2_islands.stan model is checked into the case study because it's the next level of in complexity for that model, but I haven't yet extended the case study.

I agree that we should expand the suite of test models, but in the icar model case study, the models that consist of a regression plus just an icar or just a random effects are degenerate versions of the bym2 model when there's no spatial correlation or when there's perfect spatial correlation, so adding these to the test suite isn't going to tell you anything new. those models were included mainly to generate visualizations and stan programming pedegogy. they're just not worth running in a test suite - it's a waste of compute cycles / carbon credits.

PhilClemson commented 4 years ago

please revert all changes to the car-icar-poisson models and delete the data files you added. I will try to add appropriate datasets myself.

Thanks for the explanation, that makes a lot of sense and I agree. I've reverted the relevant commits.

jgabry commented 4 years ago

Thanks for doing all this work!

@mitzimorris Now that the icar-poisson commits are reverted, is this ready to merge?

mitzimorris commented 4 years ago

yes!

jgabry commented 4 years ago

Ok, great, will merge now. Thanks again @PhilClemson for doing all this!

PhilClemson commented 4 years ago

Thanks both, although I can't take all of the credit as I had help!

stan-dev / example-models

Fixed models / data files for cmdstan #171