Closed sheilasw closed 4 years ago
Hi, many thanks for your solution! I noticed the same problem and tried to extract the coefficients by refitting the model with only those variables in BPM...
Thanks so much for raising this. It's caused lots of confusion for other students in the forum and sadly no mentor or instructor appears to have been active on Coursera for more than 2 years.
I have an alternate workaround which gets the coefficients from the coef
object. I confirmed the results of my workaround match your simpler one using the mle
value. But I do hope the maintainers can remove the incorrect code in the textbook and update the package to support extraction of BPM coefficients!
My workaround:
audience_score_lm_coef = coef(audience_score_lm)
audience_score_lm_coef_means = as.data.frame(audience_score_lm_coef$conditionalmeans)
audience_score_lm_coef_means %>%
filter(feature_filmyes == 0 & dramayes == 0 & mpaa_rating_Ryes == 0 & thtr_rel_year == 0 & oscar_seasonyes == 0 & summer_seasonyes == 0 & imdb_num_votes == 0 & best_pic_statusnot_nominated == 0 & best_pic_statuswon == 0 & best_actor_winyes == 0 & best_actress_winyes == 0 & best_dir_winyes == 0 & top200_boxyes == 0 & runtime != 0 & imdb_rating != 0 & critics_score != 0)
The filter's extremely clunky; I'm sure there's a more elegant way to do this (perhaps even dynamically using audience_score_bpm$best.vars). But you get the idea - the 3 variables != 0 are the ones in the BPM per the predict
function.
The "intercept" in this object is also for the centered model. I'm surprised this is never mentioned in the textbook or in the Week 4 lab as a caveat for working with this package.
After some digging, I have found where the issue is. The function coef
returns coefficients, conditional means, se's etc for the top n.models
with the output sorted by the posterior model probabilities. Therefore the order of the models in the original bas object and the coefficient object do not align.
A short term work around is to find the order of the posterior probabilities and then extract the location for the model that is the BPM, then use that as the index for the coefficients.
BPM.rank = (1:lmmovies$n.models)[topm == vmodel]
lmmovies.coef = coef(lmmovies)
BPM.coef = lmmovies.coef$conditionalmeans[BPM.rank,]
BPM.coef
cbind(BPM.coef[BPM.coef != 0], lmmovies$mle[[vmodel]])
Working on the code to extract just the BPM coefficients and other info.
Updated code in Book Chapter to show correct way to extract coefficients under HPM, MPM and BPM StatsWithR/book#80
Dear Merliseclyde/Bas,
Many thanks for your email. I wonder do you and others check the INTERCEPT? It is not a "true" intercept but the "mean" of observations and thus you can see that they are the same value for all models. I have found a way to extract the intercept. If you and others need it please let me know.
In addition, I deeply digged into BPM and I saw that it is weird in some cases. For example, the BPM show 7 selected variables but there is one that its coefficient is 0. And also, because BPM try to get the closest loss to of MBA (full model). I did check selected variables with normal linear model and the result showed many of them are INSIGNIFICANT. In my opinion, the strongest poitn of BMA is to handle uncertainty and also exclude autocorrelation (multicollinearity). So, if we select the one close to BMA to get a closest loss model is ... high risk. For example, in my case the HPM with 4 predators (7 with BPM) and all of them are significant. The adjusted R2 is higher than the one of BPM. RMSEs of BMA, HPM and BPM are 11.21, 11.24 and 11.235 respectively. The mean is 142.47 and thus the differences of those losses are not significant. Generally, we add three more predictors to get a weird result, high uncertainty and multicollinearity as well. In my opinion, BPM can not applied for all cases instead of BHM.
Best regards, Hien Huynh
On Fri, Oct 30, 2020, 02:36 Merlise Clyde notifications@github.com wrote:
Updated code in Book Chapter to show correct way to extract coefficients under HPM, MPM and BPM StatsWithR/book#80 https://github.com/StatsWithR/book/issues/80
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/merliseclyde/BAS/issues/49#issuecomment-719117480, or unsubscribe https://github.com/notifications/unsubscribe-auth/AQ4MIRMDWJ7F2D56UWQFYCLSNIKCTANCNFSM4O6ZCMWQ .
Update:
I found a way to pull the coefficients for the model. I still can't pull the correct coefficients using the method I originally asked about.
Method that works, in case it helps someone else reading this:
lmmovies$namesx[lmBPM$bestmodel + 1]
(there are multiple ways of getting these)lmmovies$mle[lmBPM$best]
To confirm I had pulled the correct information, I used this information to predict the first record, and it matched the fitted value from the model for that record.
lmmovies$X[1, ]
lmmovies$mean.x
lmmovies$shrinkage[lmBPM$best]
There's some matching up/data prep to be done to get ready to do the math, which I'm not including here because I'm now working on a different model than the one shown below and the code wouldn't work.
For each variable, calculate: a = (coefficient shrinkage) (observation - sample mean)
Then, Y = Intercept + sum(a)
Original post:
I am using the BAS package to do Bayesian multiple regression modeling for the Coursera course from Duke University, "Bayesian Statistics."
I am experiencing what I think is a bug. When I pull the mean values for the coefficients for the BPM, I have been assuming that:
That isn't what I'm experiencing, however.
I'm using BAS version 1.5.5 and statsr version 0.2.1 in R version 4.0.2. My OS is Mac OSX Catalina 10.15.5.
Here's sample code that reproduces what I'm seeing.
library(dplyr)
library(statsr)
library(BAS)
`
load("movies.Rdata")I noticed the same behavior in the example of this procedure given in the textbook for this course, the version last built on 2020-06-01, on pp. 191-192. On p. 191, the list of variables for the BPM is pulled. On p. 192, the coefficients are pulled. As an example of what I mean, the variable Pop has a non-zero coefficient, however, it is not in the list of coefficients. The variable M, on the other hand, is on the list of coefficients but has a zero coefficient.