ARM models: verify and create pull request

syclik commented 10 years ago

[x] verify ARM models return sensible results (compare to book)
[x] create pull request

PeterLi2016 commented 10 years ago

Ch4.4.

earn.logmodel.3 <- lm (log.earn ~ height + male + height:male)

gives

Coefficients:
(Intercept)       height         male  height:male  
   8.388488     0.017008    -0.078586     0.007447

and in Stan,

data {
  int<lower=0> N;
  vector[N] earn;
  vector[N] height;
  vector[N] male;
}
transformed data {
  vector[N] log_earn;        // log transformation
  vector[N] inter;           // interaction
  log_earn <- log(earn);        
  inter    <- height .* male;
}
parameters {
  vector[4] beta;
  real<lower=0> sigma;
}
model {
  log_earn ~ normal(beta[1] + beta[2] * height + beta[3] * male 
                    + beta[4] * inter, sigma);
}

gives

           mean se_mean   sd    2.5%     25%     50%     75%   97.5% n_eff Rhat
beta[1]    8.37    0.03 0.84    6.80    7.80    8.35    8.94   10.06   952    1
beta[2]    0.02    0.00 0.01   -0.01    0.01    0.02    0.03    0.04   952    1
beta[3]   -0.02    0.04 1.29   -2.48   -0.89   -0.03    0.86    2.48   837    1
beta[4]    0.01    0.00 0.02   -0.03   -0.01    0.01    0.02    0.04   955    1
sigma      0.88    0.00 0.02    0.85    0.87    0.88    0.89    0.92  1114    1
lp__    -445.95    0.05 1.63 -449.98 -446.74 -445.63 -444.76 -443.82  1050    1

I'm not sure why the third coefficient (beta[3] and male) are off by so much or is this ok?

bob-carpenter commented 10 years ago

See below.

On Jun 24, 2014, at 7:08 PM, Peter Li notifications@github.com wrote:

earn.logmodel.3 <- lm (log.earn ~ height + male + height:male)

gives

Coefficients: (Intercept) height male height:male
8.388488 0.017008 -0.078586 0.007447

and in Stan,

data { int N; vector[N] earn; vector[N] height; vector[N] male; } transformed data { vector[N] log_earn; // log transformation vector[N] inter; // interaction log_earn <- log(earn);
inter <- height .* male; } parameters { vector[4] beta; real sigma; } model { log_earn ~ normal(beta[1] + beta[2] * height + beta[3] * male

beta[4] * inter, sigma); }

gives
       mean se_mean   sd    2.5%     25%     50%     75%   97.5% n_eff Rhat
beta[1] 8.37 0.03 0.84 6.80 7.80 8.35 8.94 10.06 952 1 beta[2] 0.02 0.00 0.01 -0.01 0.01 0.02 0.03 0.04 952 1 beta[3] -0.02 0.04 1.29 -2.48 -0.89 -0.03 0.86 2.48 837 1 beta[4] 0.01 0.00 0.02 -0.03 -0.01 0.01 0.02 0.04 955 1 sigma 0.88 0.00 0.02 0.85 0.87 0.88 0.89 0.92 1114 1 lp__ -445.95 0.05 1.63 -449.98 -446.74 -445.63 -444.76 -443.82 1050 1

I'm not sure why the third coefficient (beta[3] and male) are off by so much.

beta[3] is off by a little more than the se_mean, which is the scale to measure how much your estimates are "off". You have a huge posterior sd in beta[3], and on that scale, it's not off by much at all.

You can try optimizing and see if the posterior mode is where it should be. If the posterior's asymmetric, the mode and mean won't necessarily be in the same place.

Bob

stan-dev / stan

ARM models: verify and create pull request #707