stan-dev / stan

Stan development repository. The master branch contains the current release. The develop branch contains the latest stable development. See the Developer Process Wiki for details.
https://mc-stan.org
BSD 3-Clause "New" or "Revised" License
2.58k stars 368 forks source link

ARM models: verify and create pull request #707

Closed syclik closed 10 years ago

syclik commented 10 years ago
PeterLi2016 commented 10 years ago

Ch4.4.

earn.logmodel.3 <- lm (log.earn ~ height + male + height:male)

gives

Coefficients:
(Intercept)       height         male  height:male  
   8.388488     0.017008    -0.078586     0.007447  

and in Stan,

data {
  int<lower=0> N;
  vector[N] earn;
  vector[N] height;
  vector[N] male;
}
transformed data {
  vector[N] log_earn;        // log transformation
  vector[N] inter;           // interaction
  log_earn <- log(earn);        
  inter    <- height .* male;
}
parameters {
  vector[4] beta;
  real<lower=0> sigma;
}
model {
  log_earn ~ normal(beta[1] + beta[2] * height + beta[3] * male 
                    + beta[4] * inter, sigma);
}

gives

           mean se_mean   sd    2.5%     25%     50%     75%   97.5% n_eff Rhat
beta[1]    8.37    0.03 0.84    6.80    7.80    8.35    8.94   10.06   952    1
beta[2]    0.02    0.00 0.01   -0.01    0.01    0.02    0.03    0.04   952    1
beta[3]   -0.02    0.04 1.29   -2.48   -0.89   -0.03    0.86    2.48   837    1
beta[4]    0.01    0.00 0.02   -0.03   -0.01    0.01    0.02    0.04   955    1
sigma      0.88    0.00 0.02    0.85    0.87    0.88    0.89    0.92  1114    1
lp__    -445.95    0.05 1.63 -449.98 -446.74 -445.63 -444.76 -443.82  1050    1

I'm not sure why the third coefficient (beta[3] and male) are off by so much or is this ok?

bob-carpenter commented 10 years ago

See below.

On Jun 24, 2014, at 7:08 PM, Peter Li notifications@github.com wrote:

earn.logmodel.3 <- lm (log.earn ~ height + male + height:male)

gives

Coefficients: (Intercept) height male height:male
8.388488 0.017008 -0.078586 0.007447

and in Stan,

data { int N; vector[N] earn; vector[N] height; vector[N] male; } transformed data { vector[N] log_earn; // log transformation vector[N] inter; // interaction log_earn <- log(earn);
inter <- height .* male; } parameters { vector[4] beta; real sigma; } model { log_earn ~ normal(beta[1] + beta[2] * height + beta[3] * male

  • beta[4] * inter, sigma); }

gives

       mean se_mean   sd    2.5%     25%     50%     75%   97.5% n_eff Rhat

beta[1] 8.37 0.03 0.84 6.80 7.80 8.35 8.94 10.06 952 1 beta[2] 0.02 0.00 0.01 -0.01 0.01 0.02 0.03 0.04 952 1 beta[3] -0.02 0.04 1.29 -2.48 -0.89 -0.03 0.86 2.48 837 1 beta[4] 0.01 0.00 0.02 -0.03 -0.01 0.01 0.02 0.04 955 1 sigma 0.88 0.00 0.02 0.85 0.87 0.88 0.89 0.92 1114 1 lp__ -445.95 0.05 1.63 -449.98 -446.74 -445.63 -444.76 -443.82 1050 1

I'm not sure why the third coefficient (beta[3] and male) are off by so much.

beta[3] is off by a little more than the se_mean, which is the scale to measure how much your estimates are "off". You have a huge posterior sd in beta[3], and on that scale, it's not off by much at all.

You can try optimizing and see if the posterior mode is where it should be. If the posterior's asymmetric, the mode and mean won't necessarily be in the same place.