rmcelreath / rethinking

Statistical Rethinking course and book package
2.1k stars 596 forks source link

Nevermind #398

Closed nicklausmillican closed 1 year ago

nicklausmillican commented 1 year ago

Looking at this more. I can recreate the problem without such a complicated setup. Here, I'll simulate an outcome w/2 independent causes, then regress on only 1 cause.

bX <- 1
bZ <- 1
x <- rnorm(N, 10, 2)
z <- rnorm(N, 10, 2)
y <- rnorm(N, bX*x + bZ*z, 1)

lm() correctly identifies the coeffcients

a <- lm(y ~ x)
summary(a)

>Call:
lm(formula = y ~ x)

Residuals:
    Min      1Q  Median      3Q     Max 
-6.6387 -1.5591  0.1323  1.5196  7.1522 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept) 10.16879    0.37413   27.18   <2e-16 ***
x            0.98556    0.03654   26.97   <2e-16 ***
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 2.317 on 998 degrees of freedom
Multiple R-squared:  0.4216,    Adjusted R-squared:  0.421 
F-statistic: 727.3 on 1 and 998 DF,  p-value: < 2.2e-16

but not quap

b <- quap(
  alist(
    y ~ dnorm(mu, sigma),
      mu <- bX*x,
        bX ~ dnorm(0,1),
      sigma ~ dexp(1)
  ), data=list(x=x, y=y)
)
precis(b)

>      mean   sd 5.5% 94.5%
bX    1.96 0.01 1.94  1.97
sigma 3.05 0.07 2.94  3.16
rmcelreath commented 1 year ago

The lm() model has an intercept parameter (accounting for Z). The quap() model does not. I expect that's the source of the difference. They are different models.