Beta regression (stan_betareg)

jgabry commented 8 years ago

Creating an issue to track stan_betareg, as we don't actually have one open for this yet. Current status is that it's almost ready on the feature/betareg branch thanks to @imadmali.

imadmali commented 8 years ago

Regarding the issues previously discussed, e68fa11 should resolve the following:

[x] optimizing fails when link='loglog'
[x] sampling/optimizing fails when link.phi="sqrt" due to beta_rng() drawing NaN (workaround: catch NaNs and divide mean_PPD by N - nan_count)
[x] make eta_z local
[x] move stuff into chunks
[x] finalize documentation
[x] vignette
[x] ~~add tests (link.phi="sqrt" remains)~~ sqrt link unstable; tests omitted
[x] check parameter names match betareg

imadmali commented 8 years ago

15c5ce9 should resolve the optimization problem when link='loglog', although I'm not certain about this. It required changing the link function in continuous.stan from mu[n] = 1-inv_cloglog(eta[n]) to mu[n] = 1-inv_cloglog(-eta[n]). The parameters returned are identical to betareg.

Reproducible Steps

# data
fam <- binomial(link = "cloglog")
dat <- list()
dat$N <- 200
dat$x <- rnorm(dat$N, 2, 1)
dat$z <- rnorm(dat$N, 2, 1)
dat$mu <- 1 - fam$linkinv(-(2.4 - 1.2*dat$x))
dat$phi <- exp(2.2 + 3.2*dat$z)
dat$y <- rbeta(dat$N, dat$mu * dat$phi, (1 - dat$mu) * dat$phi)
hist(dat$y, col = "dark grey", border = F, xlim = c(0,1))
dat <- data.frame(dat$y, dat$x, dat$z)
colnames(dat) <- c("y", "x", "z")
# fit
check <- betareg::betareg(y ~ x | z, data = dat, link = "loglog", link.phi = "log")
fit <- rstanarm::stan_betareg(y ~ x | z, data = dat, link = "loglog", link.phi = "log", chains = 4, cores = 4, algorithm = "sampling") 
fit_opt <- rstanarm::stan_betareg(y ~ x | z, data = dat, link = "loglog", link.phi = "log", algorithm = "optimizing")
# check
coef(check)
coef(fit)
coef(fit_opt)

imadmali commented 8 years ago

@bgoodri and @jgabry, does stan-dev/math#419 need to be resolved before a PR can be made? The undefined mean_PPD issue comes about due to beta_rng() generating a value that is NaN. It doesn't only happen when link.phi = "sqrt". At the moment reject is called when the square root link is used and a NaN value is generated.

jgabry commented 8 years ago

It doesn't only happen when link.phi = "sqrt"

So it happens with other link functions too? Or did you mean that it only happens when link.phi = "sqrt"? And does it always happen when link.phi = "sqrt" or just occasionally?

imadmali commented 8 years ago

So it happens with other link functions too?

Yes. From what I understand it's an underflow problem, not a link problem. If the shape parameters that go into beta_rng() are too small then a NaN is generated.

Reproducible Steps

dat <- list()
dat$N <- 200
dat$x <- rnorm(dat$N, 2, 1)
dat$z <- rnorm(dat$N, 2, 1)
dat$mu <- fam$linkinv(-0.5 + 0.5*dat$x)
dat$phi <- fam_phi$linkinv(2 + 2*dat$z)
dat$y <- rbeta(dat$N, dat$mu * dat$phi, (1 - dat$mu) * dat$phi)
hist(dat$y, col = "dark grey", border = F, xlim = c(0,1))
dat <- data.frame(dat$y, dat$x, dat$z)
colnames(dat) <- c("y", "x", "z")

check <- betareg::betareg(y ~ x | z, data = dat, link = "logit", link.phi = "sqrt")
fit <- rstanarm::stan_betareg(y ~ x | z, data = dat, link = "logit", link.phi = "sqrt",
                              chains = 4, cores = 4, algorithm = "sampling", iter = 4000)
fit_opt <- rstanarm::stan_betareg(y ~ x | z, data = dat, link = "logit", link.phi = "sqrt",
                                  algorithm = "optimizing", iter = 10000)

round(coef(check),1)
coef(fit)
round(coef(fit_opt),1)

rstan::stan_trace(fit)

And does it always happen when link.phi = "sqrt" or just occasionally?

Sometimes it doesn't happen with algorithm="sampling", sometimes it doesn't happen with algorithm="optimizing", and once it didn't happen with both and I recovered the true parameters.

I know if both shape parameters approach zero then beta_rng(alpha,beta) converges to bernoulli(0.5) but it seems more appropriate to use reject() since it drops those chains that are producing NaNs. From the models I've been fitting it seems that those chains generating NaNs are the ones that do not converge to the true parameters.

bgoodri commented 8 years ago

Maybe https://github.com/stan-dev/math/issues/419 ?

On Wed, Nov 2, 2016 at 10:25 PM, Imad Ali notifications@github.com wrote:

So it happens with other link functions too?

Yes. From what I understand it's an underflow problem, not a link problem. If the shape parameters that go into beta_rng() are too small then a NaN is generated.

Reproducible Steps

dat <- list() dat$N <- 200 dat$x <- rnorm(dat$N, 2, 1) dat$z <- rnorm(dat$N, 2, 1) dat$mu <- fam$linkinv(-0.5 + 0.5_dat$x) dat$phi <- fam_phi$linkinv(2 + 2_dat$z) dat$y <- rbeta(dat$N, dat$mu * dat$phi, (1 - dat$mu) * dat$phi) hist(dat$y, col = "dark grey", border = F, xlim = c(0,1)) dat <- data.frame(dat$y, dat$x, dat$z) colnames(dat) <- c("y", "x", "z")

check <- betareg::betareg(y ~ x | z, data = dat, link = "logit", link.phi = "sqrt") fit <- rstanarm::stan_betareg(y ~ x | z, data = dat, link = "logit", link.phi = "sqrt", chains = 4, cores = 4, algorithm = "sampling", iter = 4000) fit_opt <- rstanarm::stan_betareg(y ~ x | z, data = dat, link = "logit", link.phi = "sqrt", algorithm = "optimizing", iter = 10000)

round(coef(check),1) coef(fit) round(coef(fit_opt),1)

rstan::stan_trace(fit)

And does it always happen when link.phi = "sqrt" or just occasionally?

Sometimes it doesn't happen with algorithm="sampling", sometimes it doesn't happen with algorithm="optimizing", and once it didn't happen with both and I recovered the true parameters.

I know if both shape parameters approach zero then beta_rng(alpha,beta) converges to bernoulli(0.5) but it seems more appropriate to use reject() since it drops those chains that are producing NaNs. From the models I've been fitting it seems that those chains generating NaNs are the ones that do not converge to the true parameters.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/stan-dev/rstanarm/issues/124#issuecomment-258052044, or mute the thread https://github.com/notifications/unsubscribe-auth/ADOrqt-g9yiJ47dvEHDIvyeoW4tVULOoks5q6UYXgaJpZM4KbmzI .

imadmali commented 8 years ago

Yeah that's the problem. But I don't think bernoulli_rng(0.5) is the solution (at least not within the stan file). You end up getting chains that don't mix well (and I'm pretty sure the divergent chains are the ones generating NaNs). You can try it using the example in the previous comment and git checkout e077cc6.

jgabry commented 8 years ago

Looks like that stan/math issue was closed and fixed. I think that if this problem occurs for a non-trivial number of models people will actually fit then we should wait to release stan_betareg until the next Stan release. If it's super rare then we can probably go ahead and release in the next rstanarm even if that's before the Stan release.

On Wednesday, November 2, 2016, Imad Ali notifications@github.com wrote:

Yeah that's the problem. But I don't think bernoulli_rng(0.5) is the solution (at least not within the stan file). You end up getting chains that don't mix well (and I'm pretty sure the divergent chains are the ones generating NaNs). You can try it using the example in the previous comment and git checkout e077cc6.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/stan-dev/rstanarm/issues/124#issuecomment-258053238, or mute the thread https://github.com/notifications/unsubscribe-auth/AHb4Q0za8HQJtTuitmpJtYHlT9z3gPEaks5q6UkFgaJpZM4KbmzI .

imadmali commented 8 years ago

This should be ready to merge.continuous.stan picks up any NaN values and divides mean_PPD by N - (the number of NaNs) to work around the beta_rng() underflow issue. The sqrt link tests have been omitted since @bgoodri mentioned that the link function isn't going to be stable enough to pass under algorithm="optimizing".

jgabry commented 8 years ago

Cool, I'll take a look at the recent changes asap. Actually, can you first update your branch with the branch for the use-bayesplot-package PR? Then it really would be ready to merge.

Regarding testing the sqrt link, that makes sense, but even if we aren't testing the estimates I think we should test that it at least runs. Can be for just a few iterations with small data, so essentially instantaneous.

Jonah

On Thu, Nov 17, 2016 at 7:03 PM Imad Ali notifications@github.com wrote:

This should be ready to merge.continuous.stan picks up any NaN values and divides mean_PPD by N - (the number of NaNs) to work around the beta_rng() underflow issue. The sqrt link tests have been omitted since @bgoodri https://github.com/bgoodri mentioned that the link function isn't going to be stable enough to pass under algorithm="optimizing".

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/stan-dev/rstanarm/issues/124#issuecomment-261409385, or mute the thread https://github.com/notifications/unsubscribe-auth/AHb4Qwe2bINgYFA9r36FGpZXHjhnN8evks5q_OtngaJpZM4KbmzI .

jgabry commented 7 years ago

@imadmali actually I'll take care of the stuff I mentioned in my previous comment

imadmali commented 7 years ago

@jgabry I can push a quick sqrt link test in the next few minutes if that makes things easier.

jgabry commented 7 years ago

ok cool thanks.

jgabry commented 7 years ago

A few other things to fix while you're at it

[x] The doc for link and link.phi arguments says "Same as betareg" but I don't think they're exactly the same. betareg allows a string or an object of class "link-glm" but stan_betareg only allows a string. So, for example,link=make.link("logit") works with betareg but not stan_betareg. It's fine to not support that for stan_betareg but doc should clarify.
[x] I think the note in the Details section of the doc about not using the sqrt link should instead go in the doc for whichever of the link or link.phi arguments it applies to.

jgabry commented 7 years ago

Also, in stan_betareg.R I see

  # pass existence of declaration of linear predictor of the dispertion parameter

  Z_true <- length(grep("\\|", all.names(formula)))
  # Z_true <- all.names(formula)[3]
  # if (Z_true=="|") {
  #   Z_true <- 1
  # }
  # else {
  #   Z_true <- 0
  # }

but what does it mean to "pass existence"? Are you just checking to see if there's a vertical bar in the user's formula?

imadmali commented 7 years ago

yeah, is there a better way?

jgabry commented 7 years ago

Your way is probably fine, although it returns 0 or a positive integer instead of TRUE/FALSE. Usually that's a fine substitute for TRUE/FALSE but not always. Depends how you later use it. But do you really need to pass Z_true to stan_betareg.fit? Can Z_true not be inferred from z inside stan_betareg.fit?

Alternatives:

any(grepl("\\|", formula)) # note: that's grepl not grep

!is.null(lme4::findbars(formula))

imadmali commented 7 years ago

I think you're right. I should be able to use exists("z") in stan_betareg.fit.

jgabry commented 7 years ago

I'd be careful with exists. exists("z") will be TRUE even if z is just all 1s or something like that, which it might when no z term is specified in the formula. I think you'll just want to check whether z is whatever model.matrix(br, model = "precision") returns when no precision model is specified by the user. That might be NULL but it might be a column of 1s, or something else.

jgabry commented 7 years ago

If it's easiest just to leave Z_true in there instead of messing with that, that's fine too. But I would remove all the commented out code if it's not needed anymore.

jgabry commented 7 years ago

Looks like for a model like this without a z component you will still have z as a matrix with a single column of 1s.

fit <- betareg(y ~ x, data = fake_dat, link = "logit")
Z <- model.matrix(fit, model = "precision")
head(Z)
  (Intercept)
1           1
2           1
3           1
4           1
5           1
6           1

so if you pass z=Z to stan_betareg.fit and then check if z exists using exists("z") you will get a false positive. That's why exists isn't good here.

jgabry commented 7 years ago

Also, once these last few things are cleared up go I think it's time to go ahead and make a pull request. If you can make the PR this afternoon/early evening then I think I'll have time to review the rest of it tonight and there's a chance we can get it in the release this weekend. Otherwise it can go in the next one.

imadmali commented 7 years ago

Yeah I just checked too. I think grabbing the vertical bar might be the best way (at least that I can think of at the moment). I'll clean up the comments.

Also, if you look at the first example in the vignette, for some reason the posterior predictive (grid) plot is off compared to running the code chunks independently (i.e. not processed through knitr). The whole point of the example is to show that the true model captures more of the distribution of the outcome. Any idea about what's going on here? The plots seems to be fine in the second example.

Also, did you still want me to merge the use-bayesplot-package PR into feature/betareg before I make a PR?

jgabry commented 7 years ago

After you merge the use-bayesplot branch into yours the pp_check plots will look different. I think merging the bayesplot branch into yours will be very quick (probably no conflicts except the description file). So can you try that and see if the vignette is still weird? If so I'll figure out what'a going wrong and fix it.

Oh, and you'll need to install bayesplot like you did with rstantools.

On Fri, Nov 18, 2016 at 5:02 PM Imad Ali notifications@github.com wrote:

Yeah I just checked too. I think grabbing the vertical bar might be the best way (at least that I can think of at the moment). I'll clean up the comments.

Also, if you look at the first example in the vignette, for some reason the posterior predictive (grid) plot is off compared to running the code chunks independently (i.e. not processed through knitr). The whole point of the example is to show that the true model captures more of the distribution of the outcome. Any idea about what's going on here? The plots seems to be fine in the second example.

[image: image] https://cloud.githubusercontent.com/assets/16767381/20448142/9406d08a-adb0-11e6-80d8-f8eb42cb29a4.png

Also, did you still want me to merge the use-bayesplot-package PR into feature/betareg before I make a PR?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/stan-dev/rstanarm/issues/124#issuecomment-261653777, or mute the thread https://github.com/notifications/unsubscribe-auth/AHb4Q7wUe-CQ366oTjIelxq8hnZDvvS0ks5q_iCIgaJpZM4KbmzI .

jgabry commented 7 years ago

If there's a conflict in DESCRIPTION keep whatever is on the use-bayesplot branch (at a minimum that's dropping arm from suggests, adding bayesplot to imports) instead of what's on your branch, and then just add betareg to the packages after the fact. Otherwise there shouldn't be conflicts because I've kept the use-bayesplot branch up to date with master.

On Fri, Nov 18, 2016 at 5:11 PM Jonah Sol Gabry jgabry@gmail.com wrote:

After you merge the use-bayesplot branch into yours the pp_check plots will look different. I think merging the bayesplot branch into yours will be very quick (probably no conflicts except the description file). So can you try that and see if the vignette is still weird? If so I'll figure out what'a going wrong and fix it.

Oh, and you'll need to install bayesplot like you did with rstantools.

On Fri, Nov 18, 2016 at 5:02 PM Imad Ali notifications@github.com wrote:

Yeah I just checked too. I think grabbing the vertical bar might be the best way (at least that I can think of at the moment). I'll clean up the comments.

Also, if you look at the first example in the vignette, for some reason the posterior predictive (grid) plot is off compared to running the code chunks independently (i.e. not processed through knitr). The whole point of the example is to show that the true model captures more of the distribution of the outcome. Any idea about what's going on here? The plots seems to be fine in the second example.

[image: image] https://cloud.githubusercontent.com/assets/16767381/20448142/9406d08a-adb0-11e6-80d8-f8eb42cb29a4.png

Also, did you still want me to merge the use-bayesplot-package PR into feature/betareg before I make a PR?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/stan-dev/rstanarm/issues/124#issuecomment-261653777, or mute the thread https://github.com/notifications/unsubscribe-auth/AHb4Q7wUe-CQ366oTjIelxq8hnZDvvS0ks5q_iCIgaJpZM4KbmzI .

imadmali commented 7 years ago

The merge had the DESCRIPTION conflict you mentioned which was easy to resolve, but the now when stan_betareg() finishes running the model it comes up with the following error:

The function that's getting flagged is in stan_betareg.R. Has get_prior_info() been replaced/removed?

jgabry commented 7 years ago

Ah, yes. Go ahead and get rid of the prior.info = get_prior_info(...) line. There's now a prior_summary() function, but we can worry about that later. For now just remove the prior.info part.

On Fri, Nov 18, 2016 at 6:18 PM, Imad Ali notifications@github.com wrote:

The merge had the DESCRIPTION conflict you mentioned which was easy to resolve, but the now when stan_betareg() finishes running the model it comes up with the following error:

[image: image] https://cloud.githubusercontent.com/assets/16767381/20450029/32907152-adbb-11e6-8c22-57f3b2427ffb.png

The function that's getting flagged is in stan_betareg.R. Has get_prior_info() been replaced/removed?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/stan-dev/rstanarm/issues/124#issuecomment-261667433, or mute the thread https://github.com/notifications/unsubscribe-auth/AHb4Q69BK6nPQhAZ5v-dHhSRHWakmp1iks5q_jI4gaJpZM4KbmzI .

imadmali commented 7 years ago

Got it. I pushed everything. Now I also get weird grid plots when I run the code chunks independently, but it doesn't look as bad when I process through knitr.

jgabry commented 7 years ago

Ok I'll take a look at that. I just noticed that a bug was introduced into log_lik on the betareg branch that affects other models, not just beta regressions. For example,

roaches$roach100 <- roaches$roach1 / 100 fit <- stan_glm( y ~ roach100 + treatment + senior, offset = log(exposure2), data = roaches, family = poisson(link = "log"), prior = normal(0, 2.5), prior_intercept = normal(0, 10), iter = 500 # to speed up example ) ll <- log_lik(fit) # shouldn't error

On Fri, Nov 18, 2016 at 8:45 PM, Imad Ali notifications@github.com wrote:

Got it. I pushed everything. Now I also get weird grid plots when I run the code chunks independently, but it doesn't look as bad when I process through knitr.

[image: image] https://cloud.githubusercontent.com/assets/16767381/20452063/cb997560-adcf-11e6-8c10-d631101ce6d9.png

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/stan-dev/rstanarm/issues/124#issuecomment-261684349, or mute the thread https://github.com/notifications/unsubscribe-auth/AHb4QxOWR4E9JTE2ibKDWfuhJRlyhdnrks5q_lTQgaJpZM4KbmzI .

imadmali commented 7 years ago

7db572f should fix the bug you just mentioned. Let me know if it doesn't. It looks like I left out the brackets for the betareg if statement in loo.R (lines 413-430).

jgabry commented 7 years ago

Looks like it works now. Thanks. Weird though. Was that not caught by any tests?

On Fri, Nov 18, 2016 at 11:07 PM, Imad Ali notifications@github.com wrote:

7db572f https://github.com/stan-dev/rstanarm/commit/7db572faacc99ccdfab76dc41bb8a3bc7615f491 should fix it. Let me know if it doesn't. It looks like I left out the brackets for the betareg if statement in loo.R (lines 413-430).

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/stan-dev/rstanarm/issues/124#issuecomment-261691956, or mute the thread https://github.com/notifications/unsubscribe-auth/AHb4Q-Y6FWSE8VtByyIb4ox4j8QxZzIFks5q_nYLgaJpZM4KbmzI .

imadmali commented 7 years ago

It probably would have if I bothered running R CMD check more than once since I started working on stan_betareg. Is that how you picked it up?

jgabry commented 7 years ago

Actually I found it just by running it on an example and seeing that it didn't work. But yeah, definitely a good idea to run the tests every so often, especially with Travis being relatively unpredictable.

Travis is actually passing now for the use-bayesplot-pkg branch but I think it's timing out on your branch because of the time coming from building the additional vignette. It would probably have failed because of log_lik otherwise.

On Fri, Nov 18, 2016 at 11:20 PM Imad Ali notifications@github.com wrote:

It probably would have if I bothered running R CMD check more than once since I started working on stan_betareg. Is that how you picked it up?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/stan-dev/rstanarm/issues/124#issuecomment-261692410, or mute the thread https://github.com/notifications/unsubscribe-auth/AHb4Qw6pCjTmNFrG52HMG0yPC3AuDLdvks5q_njygaJpZM4KbmzI .

jgabry commented 7 years ago

I would run all the tests to make sure there isn't anything else like this due to changes to any other functions. Even if Travis isn't working all tests should still pass locally.

On Fri, Nov 18, 2016 at 11:51 PM Jonah Sol Gabry jgabry@gmail.com wrote:

Actually I found it just by running it on an example and seeing that it didn't work. But yeah, definitely a good idea to run the tests every so often, especially with Travis being relatively unpredictable.

Travis is actually passing now for the use-bayesplot-pkg branch but I think it's timing out on your branch because of the time coming from building the additional vignette. It would probably have failed because of log_lik otherwise.

On Fri, Nov 18, 2016 at 11:20 PM Imad Ali notifications@github.com wrote:

It probably would have if I bothered running R CMD check more than once since I started working on stan_betareg. Is that how you picked it up?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/stan-dev/rstanarm/issues/124#issuecomment-261692410, or mute the thread https://github.com/notifications/unsubscribe-auth/AHb4Qw6pCjTmNFrG52HMG0yPC3AuDLdvks5q_njygaJpZM4KbmzI .

imadmali commented 7 years ago

What's weird is that travis build #667 (the push after I merged use-bayesplot-package into feature/betareg) passed, but subsequent builds are erroring.

After running devtools::check() the only test that doesn't pass is "stan_gamm4 returns expected result for sleepstudy example" in test_posterior_predict.R. Although it passes when I run the block of code in isolation. I'm not sure what's going on there.

bgoodri commented 7 years ago

Don't worry about stan_gamm4.

On Nov 19, 2016 3:15 AM, "Imad Ali" notifications@github.com wrote:

What's weird is that travis build #667 (the push after I merged use-bayesplot-package into feature/betareg) passed, but subsequent builds are erroring.

After running devtools::check() the only test that doesn't pass is "stan_gamm4 returns expected result for sleepstudy example" in test_posterior_predict.R. Although it passes when I run the block of code in isolation. I'm not sure what's going on there.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/stan-dev/rstanarm/issues/124#issuecomment-261700729, or mute the thread https://github.com/notifications/unsubscribe-auth/ADOrql61ZO2S9iDU_fV4WvfFeXisKfzxks5q_rA8gaJpZM4KbmzI .

jgabry commented 7 years ago

As long as the problem is only affecting stan_gamm4 because of something wrong with stan_gamm4 (likely) and not something accidentally altered, like before in log_lik.

On Sat, Nov 19, 2016 at 9:18 AM bgoodri notifications@github.com wrote:

Don't worry about stan_gamm4.

On Nov 19, 2016 3:15 AM, "Imad Ali" notifications@github.com wrote:

What's weird is that travis build #667 (the push after I merged use-bayesplot-package into feature/betareg) passed, but subsequent builds are erroring.

After running devtools::check() the only test that doesn't pass is "stan_gamm4 returns expected result for sleepstudy example" in test_posterior_predict.R. Although it passes when I run the block of code in isolation. I'm not sure what's going on there.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <https://github.com/stan-dev/rstanarm/issues/124#issuecomment-261700729 , or mute the thread < https://github.com/notifications/unsubscribe-auth/ADOrql61ZO2S9iDU_fV4WvfFeXisKfzxks5q_rA8gaJpZM4KbmzI

.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/stan-dev/rstanarm/issues/124#issuecomment-261716390, or mute the thread https://github.com/notifications/unsubscribe-auth/AHb4Q-U3ZdFeexucWTOINjxNtVXaObU4ks5q_wUigaJpZM4KbmzI .

jgabry commented 7 years ago

@imadmali can you submit a PR whenever you think it's ready?

imadmali commented 7 years ago

Will do.

The tests "stan_glm throws appropriate errors, warnings, and messages" and "gaussian returns expected result for trees example" in test_stan_glm.R fail when I run them separately, but they don't get picked up in devtools::check(). The former is just a matter of changing quotes in the expected string and is resolved in a52291d, but I'm not sure why the latter is failing. Also, they fail on master so I'm guessing (hoping) it has nothing to do with the stan_betareg code.

When I ran the most recent test it was just stan_gamm4 that failed.

Some of the less serious things that get flagged when I run the tests is,

checking Rd \usage sections ... WARNING Undocumented arguments in documentation object 'stan_betareg' ‘Z_true’

and

checking R code for possible problems ... NOTE stan_betareg.fit: no visible binding for global variable ‘xtemp’ stan_betareg.fit: no visible binding for global variable ‘xbar’ stan_betareg.fit: no visible binding for global variable ‘prior_scale’ stan_betareg.fit: no visible binding for global variable ‘prior_scale_for_intercept’ stan_betareg.fit: no visible binding for global variable ‘prior_dist_z’ stan_betareg.fit: no visible binding for global variable ‘prior_dist_for_intercept_z’ stan_betareg.fit: no visible binding for global variable ‘prior_mean_for_intercept_z’ stan_betareg.fit: no visible binding for global variable ‘prior_scale_for_intercept_z’ stan_betareg.fit: no visible binding for global variable ‘prior_df_for_intercept_z’ Undefined global functions or variables: prior_df_for_intercept_z prior_dist_for_intercept_z prior_dist_z prior_mean_for_intercept_z prior_scale prior_scale_for_intercept prior_scale_for_intercept_z xbar xtemp

Z_true shouldn't be in the documentation since it's not a user defined variable. I'm not sure what to do about the note: "no visible binding for global variable".

bgoodri commented 7 years ago

All that undefined global variable stuff is due to not having such objects before they are used. They actually get created but in those loops that call assign(). In stan_glm.fit(), there are lines setting all of those to NULL before the assign() loops.

jgabry commented 7 years ago

Regarding the failure of the test "gaussian returns expected result for trees example", that might not be a false positive. It does look like something is messed up for gaussian(link = "log"). The results aren't even close to the results from regular glm, which makes me think something might have changed that shouldn't have changed.

imadmali commented 7 years ago

@bgoodri I see it; under # useless assignments to pass R CMD check. Resolved in 3c7c0bc.

@jgabry I think it's a problem with something associated with the stan_glm code since it's failing on the master branch which has no stan_betareg code. Or, at the very least, I don't think the problem is associated associated with stan_betareg.

jgabry commented 7 years ago

I don't think the problem is with stan_betareg, but I do think the problem is with the feature/betareg branch. I think it's something in continuous.stan that changed when adding stan_betareg. And I don't think the master branch has the same problem. For example, try simulating this data

set.seed(1010)
a <- 0.5
b <- 0.25
x <- rnorm(100)
y <- exp(a + b * x) + rnorm(100,0,.1)
dat <- data.frame(y, x)

and then running this using the feature/betareg branch:

fitglm <- glm(y ~ x, data = dat, family = gaussian(link = "log"))
fitopt <- stan_glm(y ~ x, data = dat, family = gaussian(link = "log"), 
                   algorithm = "optimizing")
fitmcmc <- stan_glm(y ~ x, data = dat, family = gaussian(link = "log"), 
                    algorithm = "sampling") # fails to converge

rbind(glm = coef(fitglm), # recovers a,b
      stan_opt = coef(fitopt), # not even close
      stan_mcmc = coef(fitmcmc)) # not even close

You will get nonsense for the two stan models.

And then install from the master branch and run it again and (at least for me) I get the right answers from all models:

          (Intercept)         x
glm         0.4980279 0.2525274
stan_opt    0.4979202 0.2521449
stan_mcmc   0.4982996 0.2521855

jgabry commented 7 years ago

I also get the right answer (in the simulation exercise above) when using the use-bayesplot-package branch. So it's really gotta be changes coming uniquely from the feature/betareg branch.

imadmali commented 7 years ago

Yeah I wasn't rebuilding the stan files when I switched to master, so it problem might be in continuous.stan in the feature/betareg branch like you mentioned. i

jgabry commented 7 years ago

Here's what I get when running it on the feature/betareg branch:

          (Intercept)         x
glm         0.5080219  0.248754
stan_opt    4.8514976 -2.409278  
stan_mcmc   0.6155101 -1.721813

So something is definitely wrong.

jgabry commented 7 years ago

The results for models with gaussian(link="identity") look fine, so that should narrow down where to look in the Stan code a bit.

bgoodri commented 7 years ago

You can do in a shell

git diff --compaction-heuristic master origin/feature/betareg -- exec/continuous.stan

On Sat, Nov 19, 2016 at 5:04 PM, Jonah Gabry notifications@github.com wrote:

The results for models with gaussian(link="identity") look fine, so that should narrow down where to look in the Stan code a bit.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/stan-dev/rstanarm/issues/124#issuecomment-261742773, or mute the thread https://github.com/notifications/unsubscribe-auth/ADOrqtrkUS63pOE8CCdm6EskpKcnd4y5ks5q_3J5gaJpZM4KbmzI .

bgoodri commented 7 years ago

I would guess it has to do with the restriction on the intercept

On Sat, Nov 19, 2016 at 5:04 PM, Jonah Gabry notifications@github.com wrote:

The results for models with gaussian(link="identity") look fine, so that should narrow down where to look in the Stan code a bit.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/stan-dev/rstanarm/issues/124#issuecomment-261742773, or mute the thread https://github.com/notifications/unsubscribe-auth/ADOrqtrkUS63pOE8CCdm6EskpKcnd4y5ks5q_3J5gaJpZM4KbmzI .

jgabry commented 7 years ago

In the future, this sort of thing (breaking existing code when working on something seemingly unrelated) will be easiest to fix if detected right away instead of after a lot of time has passed and many other changes are made to the same files.

But that means that going forward we need to either

(a) make sure Travis is always working (not passing, just actually running the tests so we can see failures)

or

(b) run tests locally more often when working on a branch

Option (b) is actually probably less work but requires running stuff locally. Option (a) is more convenient but Travis can sometimes be a real pain to get working.

On Sat, Nov 19, 2016 at 5:07 PM bgoodri notifications@github.com wrote:

You can do in a shell

git diff --compaction-heuristic master origin/feature/betareg -- exec/continuous.stan

On Sat, Nov 19, 2016 at 5:04 PM, Jonah Gabry notifications@github.com wrote:

The results for models with gaussian(link="identity") look fine, so that should narrow down where to look in the Stan code a bit.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <https://github.com/stan-dev/rstanarm/issues/124#issuecomment-261742773 , or mute the thread < https://github.com/notifications/unsubscribe-auth/ADOrqtrkUS63pOE8CCdm6EskpKcnd4y5ks5q_3J5gaJpZM4KbmzI

.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/stan-dev/rstanarm/issues/124#issuecomment-261742889, or mute the thread https://github.com/notifications/unsubscribe-auth/AHb4Q25KXUm7K2b3MC2QtOLUuqhW7D6aks5q_3MJgaJpZM4KbmzI .

stan-dev / rstanarm

Beta regression (stan_betareg) #124