michael-franke / Bayesian-Regression

Material for a course on (intermediate) Bayesian regression modeling
17 stars 7 forks source link

Issue with reported parameter estimates in simple linear regression example (doplhin_agg data) #1

Closed TiagoAMarques closed 6 months ago

TiagoAMarques commented 6 months ago

I was browsing your webpage - amazing content, by the way, great stuff, thanks so much for such useful free resources - an noted that the estimates for the regression line parameters do not add up to the data, but not sure why that is going on. I run the code on my machine and I get sensible values, 0 for the intercept and ~0.94 for the slope, but in the webpage those values are in the hundreds?

TiagoAMarques commented 6 months ago

I see in the webpage

https://michael-franke.github.io/Bayesian-Regression/practice-sheets/01b-regression-BRMS.html

this: Family: gaussian Links: mu = identity; sigma = identity Formula: AUC ~ MAD Data: dolphin_agg (Number of observations: 108) Draws: 4 chains, each with iter = 2000; warmup = 1000; thin = 1; total post-warmup draws = 4000

Population-Level Effects: Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS Intercept 486.29 1818.46 -3046.98 4015.71 1.00 3541 2605 MAD 455.14 16.52 421.67 487.48 1.00 3843 2425

Family Specific Parameters: Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS sigma 17189.13 1194.50 15016.38 19715.44 1.00 3734 2765

Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS and Tail_ESS are effective sample size measures, and Rhat is the potential scale reduction factor on split chains (at convergence, Rhat = 1).

while I get this: Family: gaussian Links: mu = identity; sigma = identity Formula: AUC ~ MAD Data: dolphin_agg (Number of observations: 108) Draws: 4 chains, each with iter = 2000; warmup = 1000; thin = 1; total post-warmup draws = 4000

Regression Coefficients: Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS Intercept -0.00 0.03 -0.07 0.07 1.00 3851 3015 MAD 0.94 0.03 0.87 1.00 1.00 4012 2632

Further Distributional Parameters: Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS sigma 0.35 0.03 0.30 0.40 1.00 4127 3095

Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS and Tail_ESS are effective sample size measures, and Rhat is the potential scale reduction factor on split chains (at convergence, Rhat = 1).

I must apologise as I have never used brms, this was my first try, so I might be missing something rather obvious. But even the fact that the names on the output are different (e.g. Population-Level Effects: vs Regression Coefficients:) makes me wonder if I have done something wrong. On the other hand, the plotted data looks just like yours, and it's in the units, so my paramter values are certainly more sensible that yours, currently in the hundreds.

any way, hope this is helpful.

michael-franke commented 6 months ago

Glad to hear you find this useful and thanks for pointing this out! This is a mistake from caching previous results, where the regression was run for non-standardized values. I've recompiled the sheet and pushed an update. You should now see numbers (maybe after refreshing the page in the browser) that make sense.

Sorry for the confusion!

Screenshot 2024-05-26 at 07 56 11
TiagoAMarques commented 6 months ago

Thanks Michael. Assumed it was something along those lines. Great stuff.

TiagoAMarques commented 6 months ago

but note something else still requires updating. The sentence "The model output suggests that the posterior mean of the Intercept is around 500. The coefficient for MAD is estimated to be about 450." still refers to some other model.

image