Closed syclik closed 10 years ago
Ch4.4.
earn.logmodel.3 <- lm (log.earn ~ height + male + height:male)
gives
Coefficients:
(Intercept) height male height:male
8.388488 0.017008 -0.078586 0.007447
and in Stan,
data {
int<lower=0> N;
vector[N] earn;
vector[N] height;
vector[N] male;
}
transformed data {
vector[N] log_earn; // log transformation
vector[N] inter; // interaction
log_earn <- log(earn);
inter <- height .* male;
}
parameters {
vector[4] beta;
real<lower=0> sigma;
}
model {
log_earn ~ normal(beta[1] + beta[2] * height + beta[3] * male
+ beta[4] * inter, sigma);
}
gives
mean se_mean sd 2.5% 25% 50% 75% 97.5% n_eff Rhat
beta[1] 8.37 0.03 0.84 6.80 7.80 8.35 8.94 10.06 952 1
beta[2] 0.02 0.00 0.01 -0.01 0.01 0.02 0.03 0.04 952 1
beta[3] -0.02 0.04 1.29 -2.48 -0.89 -0.03 0.86 2.48 837 1
beta[4] 0.01 0.00 0.02 -0.03 -0.01 0.01 0.02 0.04 955 1
sigma 0.88 0.00 0.02 0.85 0.87 0.88 0.89 0.92 1114 1
lp__ -445.95 0.05 1.63 -449.98 -446.74 -445.63 -444.76 -443.82 1050 1
I'm not sure why the third coefficient (beta[3] and male) are off by so much or is this ok?
See below.
On Jun 24, 2014, at 7:08 PM, Peter Li notifications@github.com wrote:
earn.logmodel.3 <- lm (log.earn ~ height + male + height:male)
gives
Coefficients: (Intercept) height male height:male
8.388488 0.017008 -0.078586 0.007447and in Stan,
data { int
N; vector[N] earn; vector[N] height; vector[N] male; } transformed data { vector[N] log_earn; // log transformation vector[N] inter; // interaction log_earn <- log(earn);
inter <- height .* male; } parameters { vector[4] beta; realsigma; } model { log_earn ~ normal(beta[1] + beta[2] * height + beta[3] * male
- beta[4] * inter, sigma); }
gives
mean se_mean sd 2.5% 25% 50% 75% 97.5% n_eff Rhat
beta[1] 8.37 0.03 0.84 6.80 7.80 8.35 8.94 10.06 952 1 beta[2] 0.02 0.00 0.01 -0.01 0.01 0.02 0.03 0.04 952 1 beta[3] -0.02 0.04 1.29 -2.48 -0.89 -0.03 0.86 2.48 837 1 beta[4] 0.01 0.00 0.02 -0.03 -0.01 0.01 0.02 0.04 955 1 sigma 0.88 0.00 0.02 0.85 0.87 0.88 0.89 0.92 1114 1 lp__ -445.95 0.05 1.63 -449.98 -446.74 -445.63 -444.76 -443.82 1050 1
I'm not sure why the third coefficient (beta[3] and male) are off by so much.
beta[3] is off by a little more than the se_mean, which is the scale to measure how much your estimates are "off". You have a huge posterior sd in beta[3], and on that scale, it's not off by much at all.
You can try optimizing and see if the posterior mode is where it should be. If the posterior's asymmetric, the mode and mean won't necessarily be in the same place.