Pystan example model - comparing two groups

anmwinter commented 7 years ago

Hello,

I asked over on the Pystan group about submitting juypter notebook example models using Pystan. I was directed to over here. I am in the process of moving our models into Pystan so this is a learning process for me.

I created a jupyter notebook here: https://github.com/bioinfonm/bioinfonm.github.io/blob/master/_posts/pystan_musings_part1_img/pystan_three_centirues_english_grain_data.ipynb

The notebook, raw data, and images are all: https://github.com/bioinfonm/bioinfonm.github.io/tree/master/_posts/pystan_musings_part1_img

I was wondering what was the best way to get this vetted and then hosted here as an example for PyStan.

Thank you for the time and consideration, Ara

bob-carpenter commented 7 years ago

The case studies eventually go on the web site repo. Your Stan model has lots of problems you can see in just this fragment:

parameters { //The primary parameters of interest that are to be estimated. 
  real mu1; // mean of y1
  ...
  real<lower=0> sigma1; // standard deviation of y1
  ...
}
model { // Where your priors and likelihood are specified. Uniform, cauchy, and normal 
        // priors might be a good place to start?
  mu1 ~ uniform(0, 30); // uniform prior, maybe try half-normal, exp, or half-cauchy
  ...
y1 ~ normal(mu1, sigma1);
...

The code itself has some problems:

if you put a uniform distribution on mu1, then you need to constrain the parameter to have matching lower and upper bounds---Stan models should have a finite log likelihood for all parameter values meeting the declared constraints
we recommend much more informative priors

The doc also has some issues

mu1 isn't the mean of y1, it's a location parameter
you don't want to doc the language in a program, such as what the parameters block is
you have lingering open-ended questions on the model---these are best left on the outside

anmwinter commented 7 years ago

@bob-carpenter Thanks for the feedback! I'll work on correcting this. This is a learning process for me.

bob-carpenter commented 7 years ago

For the moment, we're trying to keep the case studies to best practices recommendations for Stan. We're working on establishing a place for more community oriented sharing of work we wouldn't need to vet so closely. There are prior recommendations on the stan-dev/stan wiki and in the manual regression chapter.

bob-carpenter commented 7 years ago

You also don't need blocks with nothing in them and you can vectorize everything. This model should look like this:

data {
  int N[2];
  vector[N[1]] y1;
  vector[N[2]] y2;
}
parameters {
  vector[2] mu;
  vector<lower=0>[2] sigma;
}
model {
  mu ~ normal(0, 10);
  sigma ~ cauchy(0, 5);
  y1 ~ normal(mu[1], sigma[1]);
  y2 ~ normal(mu[2], sigma[2]);
}

It'd be even easier if we had ragged arrays.

anmwinter commented 7 years ago

Thanks again @bob-carpenter ! I am working on how to vectorize data. I appreciate the model re-write.

ara

stan-dev / example-models

Pystan example model - comparing two groups #108