Should we really use confint method for Bayesian credible intervals?

jgabry commented 8 years ago

@bgoodri I've been thinking recently that it might avoid a great deal of confusion if we create a separate method for obtaining so-called credible intervals rather than using R's standard confint. Perhaps an S3 generic credint and a credint.stanreg method? (Or just a credint function, skipping the S3 method stuff?) If we want to do this then I can make the change quite easily this afternoon.

Here are some reasons why I think it's worth doing this

The documentation for stats::confint, which we can't change, clearly says it's for confidence intervals.
It would give us the opportunity to document credint, where we can make it clear what the differences are between frequentist/Bayesian confidence/credible intervals. We don't really do this anywhere at the moment but I think the differences in interpretation are important.
The cost is small. The only downside I can think of is that some users will try to call confint for a model not estimated by optimization, but we can either issue a warning to use credint (and allow confint to still be usable) or an error that forces the use of credint.
If we wanted we could provide credible intervals based on posterior quantiles (as we do now) but also give the option to compute HPD intervals (that would be valid assuming there isn't severe multimodality).

bgoodri commented 8 years ago

ok

On Wed, Jan 6, 2016 at 2:35 PM, Jonah Gabry notifications@github.com wrote:

@bgoodri https://github.com/bgoodri I've been thinking recently that it might avoid a great deal of confusion if we create a separate method for obtaining so-called credible intervals rather than using R's standard confint. Perhaps an S3 generic credint and a credint.stanreg method? (Or just a credint function, skipping the S3 method stuff?) If we want to do this then I can make the change quite easily this afternoon.

Here are some reasons why I think it's worth doing this

The documentation for stats::confint, which we can't change, clearly says it's for confidence intervals.

It would give us the opportunity to document credint, where we can make it clear what the differences are between frequentist/Bayesian confidence/credible intervals. We don't really do this anywhere at the moment but I think the differences in interpretation are important.

The cost is small. The only downside I can think of is that some users will try to call confint for a model not estimated by optimization, but we can either issue a warning to use credint (and allow confint to still be usable) or an error that forces the use of credint.

If we wanted we could provide credible intervals based on posterior quantiles (as we do now) but also give the option to compute HPD intervals (that would be valid assuming there isn't severe multimodality).

— Reply to this email directly or view it on GitHub https://github.com/stan-dev/rstanarm/issues/45.

jgabry commented 8 years ago

Do you think it's useful to do this using credint generic and credint.stanreg method or should I just make a credint function (or ci, or credible_interval, or something else)?

bgoodri commented 8 years ago

I think a plain function is fine that verifies the input is a stanreg. On Jan 6, 2016 2:00 PM, "Jonah Gabry" notifications@github.com wrote:

Do you think it's useful to do this using credint generic and credint.stanreg method or should I just make a credint function (or ci, or credible_interval, or something else)?

— Reply to this email directly or view it on GitHub https://github.com/stan-dev/rstanarm/issues/45#issuecomment-169441580.

jgabry commented 8 years ago

Ok

andrewgelman commented 8 years ago

I prefer the term “uncertainty interval” rather than “credibility interval” or “confidence interval.” For one thing, when the interval is wider, it’s more uncertainty! When the interval has zero width, it’s zero uncertainty! Etc.

On Jan 6, 2016, at 2:35 PM, Jonah Gabry notifications@github.com wrote:

@bgoodri https://github.com/bgoodri I've been thinking recently that it might avoid a great deal of confusion if we create a separate method for obtaining so-called credible intervals rather than using R's standard confint. Perhaps an S3 generic credint and a credint.stanreg method? (Or just a credint function, skipping the S3 method stuff?) If we want to do this then I can make the change quite easily this afternoon.

Here are some reasons why I think it's worth doing this

The documentation for stats::confint, which we can't change, clearly says it's for confidence intervals. It would give us the opportunity to document credint, where we can make it clear what the differences are between frequentist/Bayesian confidence/credible intervals. We don't really do this anywhere at the moment but I think the differences in interpretation are important. The cost is small. The only downside I can think of is that some users will try to call confint for a model not estimated by optimization, but we can either issue a warning to use credint (and allow confint to still be usable) or an error that forces the use of credint. If we wanted we could provide credible intervals based on posterior quantiles (as we do now) but also give the option to compute HPD intervals (that would be valid assuming there isn't severe multimodality). — Reply to this email directly or view it on GitHub https://github.com/stan-dev/rstanarm/issues/45.

jgabry commented 8 years ago

Ok, that seems reasonable. Unless @bgoodri objects, then I can use "uncertainty" instead of "credible." (If we do that, then to avoid confusion I'll also mention in the documentation that they are what people will see referred to as credible intervals elsewhere.)

jgabry commented 8 years ago

Now that I think about it, we should probably not use the function name credint if we're not calling them credible intervals. Do you have a function name you'd like to use if we call them "uncertainty" intervals?

syclik commented 8 years ago

Just in case you need a reference... http://andrewgelman.com/2010/12/21/lets_say_uncert/

On Wed, Jan 6, 2016 at 4:47 PM, Jonah Gabry notifications@github.com wrote:

Ok, that seems reasonable. Unless @bgoodri https://github.com/bgoodri objects, then I can use "uncertainty" instead of "credible." (If we do that, then to avoid confusion I'll also mention in the documentation that they are what people will see referred to as credible intervals elsewhere.)

— Reply to this email directly or view it on GitHub https://github.com/stan-dev/rstanarm/issues/45#issuecomment-169473029.

andrewgelman commented 8 years ago

I’nm not so steeped in R naming conventions. I guess “uncertainty_interval” is too long? I seem to recall that R uses “ci” so we could just use that to be consistent?

On Jan 6, 2016, at 4:48 PM, Jonah Gabry notifications@github.com wrote:

Now that I think about it, we should probably not use the function name credint if we're not calling them credible intervals. Do you have a function name you'd like to use if we call them "uncertainty" intervals?

— Reply to this email directly or view it on GitHub https://github.com/stan-dev/rstanarm/issues/45#issuecomment-169473468.

bgoodri commented 8 years ago

R uses confint() which we are agreed we want to avoid.

On Wed, Jan 6, 2016 at 4:49 PM, Andrew Gelman notifications@github.com wrote:

I’nm not so steeped in R naming conventions. I guess “uncertainty_interval” is too long? I seem to recall that R uses “ci” so we could just use that to be consistent?

On Jan 6, 2016, at 4:48 PM, Jonah Gabry notifications@github.com wrote:

Now that I think about it, we should probably not use the function name credint if we're not calling them credible intervals. Do you have a function name you'd like to use if we call them "uncertainty" intervals?

— Reply to this email directly or view it on GitHub < https://github.com/stan-dev/rstanarm/issues/45#issuecomment-169473468>.

— Reply to this email directly or view it on GitHub https://github.com/stan-dev/rstanarm/issues/45#issuecomment-169473695.

jgabry commented 8 years ago

Yeah I'd like to avoid confint() unless the model was estimated using MLE

andrewgelman commented 8 years ago

if you don’t like uncertainty_interval(), we could do bayesint() which gets the point across.

On Jan 6, 2016, at 4:51 PM, Jonah Gabry notifications@github.com wrote:

Yeah I'd like to avoid confint() unless the model was estimated using MLE

— Reply to this email directly or view it on GitHub https://github.com/stan-dev/rstanarm/issues/45#issuecomment-169474168.

bgoodri commented 8 years ago

Maybe something like central_interval() or central_int(). It's not just any interval containing 95% of the draws.

On Wed, Jan 6, 2016 at 4:52 PM, Andrew Gelman notifications@github.com wrote:

if you don’t like uncertainty_interval(), we could do bayesint() which gets the point across.

On Jan 6, 2016, at 4:51 PM, Jonah Gabry notifications@github.com wrote:

Yeah I'd like to avoid confint() unless the model was estimated using MLE

— Reply to this email directly or view it on GitHub < https://github.com/stan-dev/rstanarm/issues/45#issuecomment-169474168>.

— Reply to this email directly or view it on GitHub https://github.com/stan-dev/rstanarm/issues/45#issuecomment-169474370.

jgabry commented 8 years ago

If we restrict ourselves to just using posterior quantiles then that could be ok, but if we offer HPD intervals (not saying we definitely should, but I have the code to do it if we want) then central interval isn't quite right.

andrewgelman commented 8 years ago

True, it’s not the shortest probability interval (spin) which is what Tian and I recommend in a recent paper.

On Jan 6, 2016, at 4:54 PM, bgoodri notifications@github.com wrote:

Maybe something like central_interval() or central_int(). It's not just any interval containing 95% of the draws.

On Wed, Jan 6, 2016 at 4:52 PM, Andrew Gelman notifications@github.com wrote:

if you don’t like uncertainty_interval(), we could do bayesint() which gets the point across.

On Jan 6, 2016, at 4:51 PM, Jonah Gabry notifications@github.com wrote:

Yeah I'd like to avoid confint() unless the model was estimated using MLE

— Reply to this email directly or view it on GitHub < https://github.com/stan-dev/rstanarm/issues/45#issuecomment-169474168>.

— Reply to this email directly or view it on GitHub https://github.com/stan-dev/rstanarm/issues/45#issuecomment-169474370.

— Reply to this email directly or view it on GitHub https://github.com/stan-dev/rstanarm/issues/45#issuecomment-169474799.

andrewgelman commented 8 years ago

Our messages overlapped . . . I actually prefer shortest probability interval (spin) rather than HPD for the reason that if the density is multimodal we’re still returning just a simple interval. A

On Jan 6, 2016, at 4:56 PM, Jonah Gabry notifications@github.com wrote:

If we restrict ourselves to just using posterior quantiles then that could be ok, but if we ofter HPD intervals (not saying we definitely should, but I have the code to do it if we want) then central interval isn't quite right.

— Reply to this email directly or view it on GitHub https://github.com/stan-dev/rstanarm/issues/45#issuecomment-169475226.

bgoodri commented 8 years ago

The spin is fine with me, although for these models multimodality shouldn't be an issue.

On Wed, Jan 6, 2016 at 4:57 PM, Andrew Gelman notifications@github.com wrote:

Our messages overlapped . . . I actually prefer shortest probability interval (spin) rather than HPD for the reason that if the density is multimodal we’re still returning just a simple interval. A

On Jan 6, 2016, at 4:56 PM, Jonah Gabry notifications@github.com wrote:

If we restrict ourselves to just using posterior quantiles then that could be ok, but if we ofter HPD intervals (not saying we definitely should, but I have the code to do it if we want) then central interval isn't quite right.

— Reply to this email directly or view it on GitHub < https://github.com/stan-dev/rstanarm/issues/45#issuecomment-169475226>.

— Reply to this email directly or view it on GitHub https://github.com/stan-dev/rstanarm/issues/45#issuecomment-169475536.

andrewgelman commented 8 years ago

I don’t know how easy it is to implement spin in practice. In our paper, Tian and I have an algorithm that has good statistical properties but it’s a bit complicated, actually I don’t really understand exactly what we did!

On Jan 6, 2016, at 4:59 PM, bgoodri notifications@github.com wrote:

The spin is fine with me, although for these models multimodality shouldn't be an issue.

On Wed, Jan 6, 2016 at 4:57 PM, Andrew Gelman notifications@github.com wrote:

Our messages overlapped . . . I actually prefer shortest probability interval (spin) rather than HPD for the reason that if the density is multimodal we’re still returning just a simple interval. A

On Jan 6, 2016, at 4:56 PM, Jonah Gabry notifications@github.com wrote:

If we restrict ourselves to just using posterior quantiles then that could be ok, but if we ofter HPD intervals (not saying we definitely should, but I have the code to do it if we want) then central interval isn't quite right.

— Reply to this email directly or view it on GitHub < https://github.com/stan-dev/rstanarm/issues/45#issuecomment-169475226>.

— Reply to this email directly or view it on GitHub https://github.com/stan-dev/rstanarm/issues/45#issuecomment-169475536.

— Reply to this email directly or view it on GitHub https://github.com/stan-dev/rstanarm/issues/45#issuecomment-169475849.

jgabry commented 8 years ago

I'm also fine with doing the spin intervals, but if it's not trivial to compute them then maybe it's not worth it. I haven't seen the method described in the paper yet.

bgoodri commented 8 years ago

With 1000 draws, it is not going to be that intensive to brute force it relative to estimating the model. If we sort the draws, then there is a finite number of intervals that contain at least 95% of the points and then it is just a which.min on the diff of the admissible intervals.

On Wed, Jan 6, 2016 at 5:00 PM, Andrew Gelman notifications@github.com wrote:

I don’t know how easy it is to implement spin in practice. In our paper, Tian and I have an algorithm that has good statistical properties but it’s a bit complicated, actually I don’t really understand exactly what we did!

On Jan 6, 2016, at 4:59 PM, bgoodri notifications@github.com wrote:

The spin is fine with me, although for these models multimodality shouldn't be an issue.

On Wed, Jan 6, 2016 at 4:57 PM, Andrew Gelman notifications@github.com wrote:

Our messages overlapped . . . I actually prefer shortest probability interval (spin) rather than HPD for the reason that if the density is multimodal we’re still returning just a simple interval. A

On Jan 6, 2016, at 4:56 PM, Jonah Gabry notifications@github.com wrote:

If we restrict ourselves to just using posterior quantiles then that could be ok, but if we ofter HPD intervals (not saying we definitely should, but I have the code to do it if we want) then central interval isn't quite right.

— Reply to this email directly or view it on GitHub < https://github.com/stan-dev/rstanarm/issues/45#issuecomment-169475226 .

— Reply to this email directly or view it on GitHub <https://github.com/stan-dev/rstanarm/issues/45#issuecomment-169475536 .

— Reply to this email directly or view it on GitHub < https://github.com/stan-dev/rstanarm/issues/45#issuecomment-169475849>.

— Reply to this email directly or view it on GitHub https://github.com/stan-dev/rstanarm/issues/45#issuecomment-169476279.

andrewgelman commented 8 years ago

Here’s the paper: http://www.stat.columbia.edu/~gelman/research/published/spin.pdf Short answer is that brute force intervals are a bit noisy; some smoothing helps but then things get complicated. The method in our paper works well but it’s a bit too complicated for my taste. I’ve wanted to go back and do more but haven’t done so. Noisy intervals will be a problem, I think, especially given that (a) in some settings we don’t recommend taking a huge number of draws, and (b) often people want 95% intervals (which I think is generally a mistake, but that’s another story). So it could make sense to just implement central intervals for now? A

On Jan 6, 2016, at 5:06 PM, bgoodri notifications@github.com wrote:

With 1000 draws, it is not going to be that intensive to brute force it relative to estimating the model. If we sort the draws, then there is a finite number of intervals that contain at least 95% of the points and then it is just a which.min on the diff of the admissible intervals.

On Wed, Jan 6, 2016 at 5:00 PM, Andrew Gelman notifications@github.com wrote:

I don’t know how easy it is to implement spin in practice. In our paper, Tian and I have an algorithm that has good statistical properties but it’s a bit complicated, actually I don’t really understand exactly what we did!

On Jan 6, 2016, at 4:59 PM, bgoodri notifications@github.com wrote:

The spin is fine with me, although for these models multimodality shouldn't be an issue.

On Wed, Jan 6, 2016 at 4:57 PM, Andrew Gelman notifications@github.com wrote:

Our messages overlapped . . . I actually prefer shortest probability interval (spin) rather than HPD for the reason that if the density is multimodal we’re still returning just a simple interval. A

On Jan 6, 2016, at 4:56 PM, Jonah Gabry notifications@github.com wrote:

If we restrict ourselves to just using posterior quantiles then that could be ok, but if we ofter HPD intervals (not saying we definitely should, but I have the code to do it if we want) then central interval isn't quite right.

— Reply to this email directly or view it on GitHub < https://github.com/stan-dev/rstanarm/issues/45#issuecomment-169475226 .

— Reply to this email directly or view it on GitHub <https://github.com/stan-dev/rstanarm/issues/45#issuecomment-169475536 .

— Reply to this email directly or view it on GitHub < https://github.com/stan-dev/rstanarm/issues/45#issuecomment-169475849>.

— Reply to this email directly or view it on GitHub https://github.com/stan-dev/rstanarm/issues/45#issuecomment-169476279.

— Reply to this email directly or view it on GitHub https://github.com/stan-dev/rstanarm/issues/45#issuecomment-169477645.

jgabry commented 8 years ago

Ben, if S is the posterior sample size, prob in (0,1) is desired probability mass in interval, and mat is matrix of posterior draws, would this do it:

w <- S * prob # I guess this assumes w will be an integer, so we could round or take the ceiling
mark1 <- seq_len(S - w)
mark2 <- mark1 + w
sorted <- apply(mat, 2L, sort, decreasing = FALSE)
diffs <- sorted[mark2,, drop = FALSE] - sorted[mark1,, drop = FALSE]
min_ids <- apply(diffs, 2L, which.min)
lo <- hi <- cbind(min_ids, seq_len(ncol(mat)))
hi[, 1L] <- hi[, 1L] + w
spin <- cbind(sorted[lo], sorted[hi])

jgabry commented 8 years ago

Andrew, regardless of which intervals we end up going with, would you prefer setting the default to something other than 95% or should we go with 95% since it's what people will be expecting? We might not have too many opportunities to mess with the defaults because if we change them many users' R scripts won't reproduce the original output. Of course, using the default arguments in a script is not advisable for this reason, but a lot of people do it.

jgabry commented 8 years ago

Ben, I updated the code above to fix a typo (sel1 --> mark1) and added note about w not being an integer, but now I'm thinking that the code above is just computing an HPD for a unimodal distribution and not necessary the SPIn

bgoodri commented 8 years ago

What is sel1?

On Wed, Jan 6, 2016 at 5:12 PM, Jonah Gabry notifications@github.com wrote:

Ben, if S is the posterior sample size, prob in (0,1) is desired probability mass in interval, and mat is matrix of posterior draws, would this do it:

w <- S * prob mark1 <- seq_len(S - w) mark2 <- sel1 + w sorted <- apply(mat, 2L, sort, decreasing = FALSE) diffs <- sorted[mark2,, drop = FALSE] - sorted[mark1,, drop = FALSE] min_ids <- apply(diffs, 2L, which.min) lo <- hi <- cbind(min_ids, seq_len(ncol(mat))) hi[, 1L] <- hi[, 1L] + w ci <- cbind(sorted[lo], sorted[hi])

— Reply to this email directly or view it on GitHub https://github.com/stan-dev/rstanarm/issues/45#issuecomment-169478862.

andrewgelman commented 8 years ago

I dunno, what do you think? I could see making 50% the default, or I could see 95% as the default. There’s also the idea of using the half-width of the 68% interval as a quantile-based standard error. Right now we’re producing a median-based standard error—I think it’s the width of the central 50% interval, appropriately scaled? Or something like that? I guess we should make all these decisons now!

On Jan 6, 2016, at 5:16 PM, Jonah Gabry notifications@github.com wrote:

Andrew, regardless of which intervals we end up going with, would you prefer setting the default to something other than 95% or should we go with 95% since it's what people will be expecting? We might not have too many opportunities to mess with the defaults because if we change them many users' R scripts won't reproduce the original output. Of course, using the default arguments in a script is not advisable for this reason, but a lot of people do it.

— Reply to this email directly or view it on GitHub https://github.com/stan-dev/rstanarm/issues/45#issuecomment-169479873.

jgabry commented 8 years ago

sel1 should be mark1

jgabry commented 8 years ago

Yeah we should make all of these sorts of decisions now because we want to release the first version and then shouldn't really change these defaults without a really good reason

bob-carpenter commented 8 years ago

On Jan 6, 2016, at 5:21 PM, Jonah Gabry notifications@github.com wrote:

Yeah we should make all of these sorts of decisions now because we want to release the first version and then shouldn't really change these defaults without a really good reason

+1 to that!

Bob

bgoodri commented 8 years ago

We decided the standard error thing a while ago. It is what

help(mad)

describes.

On Wed, Jan 6, 2016 at 5:20 PM, Andrew Gelman notifications@github.com wrote:

I dunno, what do you think? I could see making 50% the default, or I could see 95% as the default. There’s also the idea of using the half-width of the 68% interval as a quantile-based standard error. Right now we’re producing a median-based standard error—I think it’s the width of the central 50% interval, appropriately scaled? Or something like that? I guess we should make all these decisons now!

On Jan 6, 2016, at 5:16 PM, Jonah Gabry notifications@github.com wrote:

Andrew, regardless of which intervals we end up going with, would you prefer setting the default to something other than 95% or should we go with 95% since it's what people will be expecting? We might not have too many opportunities to mess with the defaults because if we change them many users' R scripts won't reproduce the original output. Of course, using the default arguments in a script is not advisable for this reason, but a lot of people do it.

— Reply to this email directly or view it on GitHub < https://github.com/stan-dev/rstanarm/issues/45#issuecomment-169479873>.

— Reply to this email directly or view it on GitHub https://github.com/stan-dev/rstanarm/issues/45#issuecomment-169480835.

jgabry commented 8 years ago

On Wed, Jan 6, 2016 at 5:20 PM, Andrew Gelman notifications@github.com wrote:

Right now we’re producing a median-based standard error—I think it’s the width of the central 50% interval, appropriately scaled? Or something like that? I guess we should make all these decisons now!

We're using the median absolute deviation from the posterior median, scaled by 1.5 (or 1.4 something I guess)

bgoodri commented 8 years ago

I guess we need bayesian_int() with an argument indicating whether to do central, spin, HPD, etc. But that doesn't solve the question of defaults.

On Wed, Jan 6, 2016 at 5:16 PM, Jonah Gabry notifications@github.com wrote:

Andrew, regardless of which intervals we end up going with, would you prefer setting the default to something other than 95% or should we go with 95% since it's what people will be expecting? We might not have too many opportunities to mess with the defaults because if we change them many users' R scripts won't reproduce the original output. Of course, using the default arguments in a script is not advisable for this reason, but a lot of people do it.

— Reply to this email directly or view it on GitHub https://github.com/stan-dev/rstanarm/issues/45#issuecomment-169479873.

bob-carpenter commented 8 years ago

Maybe the first author, Ling Yiu, understands the paper, not to mention how they got "SPIn" past Andrew as the name.

It was my idea to apply the bootstrap in the first place (I had just been reading Efron and Tibshirani). Then Matt expounded on the idea, then it got passed off to Tian and Ling Yiu. But then I've never used "quadratic programming", which is just engineer speak for optimization.

I've never seen the point of SPIn or HPD. Yes, I read that paper. Its arguments revolve around a pointer to Box and Tiao (which I didn't follow), a graphical illustration that doesn't tickle my intuitions in figure 1, and a comment that the HPD contains the mode, whereas the central interval can cut off the mode, saying that the high density intervals are "strongly supported by inference". But aren't all 95% intervals equally well supported?

The paper also points out the elephant in the room, which is that "...highest posterior density intervals is that they depend on parameterization." After that, there's an appeal to "routine data analysis", but I'm not even sure what that means.

Also, I think you have to be very careful when mixing medians and means. You can't get the usual MCMC std errors on medians, for instance.

Bob

On Jan 6, 2016, at 5:10 PM, Andrew Gelman notifications@github.com wrote:

Here’s the paper: http://www.stat.columbia.edu/~gelman/research/published/spin.pdf Short answer is that brute force intervals are a bit noisy; some smoothing helps but then things get complicated. The method in our paper works well but it’s a bit too complicated for my taste. I’ve wanted to go back and do more but haven’t done so. Noisy intervals will be a problem, I think, especially given that (a) in some settings we don’t recommend taking a huge number of draws, and (b) often people want 95% intervals (which I think is generally a mistake, but that’s another story). So it could make sense to just implement central intervals for now? A

On Jan 6, 2016, at 5:06 PM, bgoodri notifications@github.com wrote:

With 1000 draws, it is not going to be that intensive to brute force it relative to estimating the model. If we sort the draws, then there is a finite number of intervals that contain at least 95% of the points and then it is just a which.min on the diff of the admissible intervals.

On Wed, Jan 6, 2016 at 5:00 PM, Andrew Gelman notifications@github.com wrote:

I don’t know how easy it is to implement spin in practice. In our paper, Tian and I have an algorithm that has good statistical properties but it’s a bit complicated, actually I don’t really understand exactly what we did!

On Jan 6, 2016, at 4:59 PM, bgoodri notifications@github.com wrote:

The spin is fine with me, although for these models multimodality shouldn't be an issue.

On Wed, Jan 6, 2016 at 4:57 PM, Andrew Gelman notifications@github.com wrote:

Our messages overlapped . . . I actually prefer shortest probability interval (spin) rather than HPD for the reason that if the density is multimodal we’re still returning just a simple interval. A

On Jan 6, 2016, at 4:56 PM, Jonah Gabry notifications@github.com wrote:

If we restrict ourselves to just using posterior quantiles then that could be ok, but if we ofter HPD intervals (not saying we definitely should, but I have the code to do it if we want) then central interval isn't quite right.

— Reply to this email directly or view it on GitHub < https://github.com/stan-dev/rstanarm/issues/45#issuecomment-169475226 .

— Reply to this email directly or view it on GitHub <https://github.com/stan-dev/rstanarm/issues/45#issuecomment-169475536 .

— Reply to this email directly or view it on GitHub < https://github.com/stan-dev/rstanarm/issues/45#issuecomment-169475849>.

— Reply to this email directly or view it on GitHub https://github.com/stan-dev/rstanarm/issues/45#issuecomment-169476279.

— Reply to this email directly or view it on GitHub https://github.com/stan-dev/rstanarm/issues/45#issuecomment-169477645.

— Reply to this email directly or view it on GitHub.

bgoodri commented 8 years ago

On Wed, Jan 6, 2016 at 5:51 PM, Bob Carpenter notifications@github.com wrote:

Also, I think you have to be very careful when mixing medians and means. You can't get the usual MCMC std errors on medians, for instance.

True, although we are not showing the rstanarm user the MCMCSE by default. It can be seen if you call the summary on the underyling stanfit object, in which case it is based on the mean and variance.

jgabry commented 8 years ago

Regarding the default value for the interval width (regardless of interval type), we should definitely pick now if it's going to be 95%, 50% or something else. If we use a value other than 95% then we should mention in the documentation why we're not using the traditional value. In addition to not using it as the default we could also take the extra step of explicitly discouraging the use of 95%, although then we really need to explain why or point to a reference.

Sounds like Andrew would probably prefer 50%. Anyone else have a strong preference? Andrew, do you have particular blog posts or papers on why you think 95% is a mistake? I'm ok with 50% or 95%, I just want to make sure that whatever we decide we document clearly for the user.

bgoodri commented 8 years ago

I'm fine with 50%

On Wed, Jan 6, 2016 at 6:02 PM, Jonah Gabry notifications@github.com wrote:

Regarding the default value for the interval width (regardless of interval type), we should definitely pick now if it's going to be 95%, 50% or something else. If we use a value other than 95% then we should mention in the documentation why we're not using the traditional value. In addition to not using it as the default we could also take the extra step of explicitly discouraging the use of 95%, although then we really need to explain why or point to a reference.

Sounds like Andrew would probably prefer 50%. Anyone else have a strong preference? Andrew, do you have particular blog posts or papers on why you think 95% is a mistake? I'm ok with 50% or 95%, I just want to make sure that whatever we decide we document clearly for the user.

— Reply to this email directly or view it on GitHub https://github.com/stan-dev/rstanarm/issues/45#issuecomment-169489980.

bob-carpenter commented 8 years ago

I don't think 95% is a "mistake" so much as it's unstable because each end only relies on 2.5% of the draws.

And it makes it hard to do things like check coverage compared to a 50% interval.

Bob

On Jan 6, 2016, at 6:02 PM, Jonah Gabry notifications@github.com wrote:

Regarding the default value for the interval width (regardless of interval type), we should definitely pick now if it's going to be 95%, 50% or something else. If we use a value other than 95% then we should mention in the documentation why we're not using the traditional value. In addition to not using it as the default we could also take the extra step of explicitly discouraging the use of 95%, although then we really need to explain why or point to a reference.

Sounds like Andrew would probably prefer 50%. Anyone else have a strong preference? Andrew, do you have particular blog posts or papers on why you think 95% is a mistake? I'm ok with 50% or 95%, I just want to make sure that whatever we decide we document clearly for the user.

— Reply to this email directly or view it on GitHub.

bgoodri commented 8 years ago

I think we can just import SPIn::SPIn and use that.

On Wed, Jan 6, 2016 at 5:20 PM, Jonah Gabry notifications@github.com wrote:

Ben, I updated the code above to fix a typo (sel1 --> mark1) and added note about w not being an integer, but now I'm thinking that the code above is just computing an HPD for a unimodal distribution and not necessary the SPIn

— Reply to this email directly or view it on GitHub https://github.com/stan-dev/rstanarm/issues/45#issuecomment-169480694.

andrewgelman commented 8 years ago

I don’t think we’d ever be doing HPD, I think we’d only be doing central or spin. (I’m using “spin” not as a shorthand for the particular method in paper with Tian, but rather to stand for any shortest probability interval. The difference is that in a bimodal distribution, HPD can be two intervals, while spin will just be one.
But for now I’d favor just implementing central_int to keep thing simple.

A

On Jan 6, 2016, at 5:42 PM, bgoodri notifications@github.com wrote:

I guess we need bayesian_int() with an argument indicating whether to do central, spin, HPD, etc. But that doesn't solve the question of defaults.

On Wed, Jan 6, 2016 at 5:16 PM, Jonah Gabry notifications@github.com wrote:

Andrew, regardless of which intervals we end up going with, would you prefer setting the default to something other than 95% or should we go with 95% since it's what people will be expecting? We might not have too many opportunities to mess with the defaults because if we change them many users' R scripts won't reproduce the original output. Of course, using the default arguments in a script is not advisable for this reason, but a lot of people do it.

— Reply to this email directly or view it on GitHub https://github.com/stan-dev/rstanarm/issues/45#issuecomment-169479873.

— Reply to this email directly or view it on GitHub https://github.com/stan-dev/rstanarm/issues/45#issuecomment-169485797.

andrewgelman commented 8 years ago

I never wanted to call it SPin, that got in by accident, then there was a problem where Cran doesn’t let you change the name of a package. If we put it into rstan we can call it whatever we want.

My original motivation for spin or hpd is the scale parameter in the 8-schools problem. I don’t think the central interval makes sense at all for that example.

On Jan 6, 2016, at 5:51 PM, Bob Carpenter notifications@github.com wrote:

Maybe the first author, Ling Yiu, understands the paper, not to mention how they got "SPIn" past Andrew as the name.

It was my idea to apply the bootstrap in the first place (I had just been reading Efron and Tibshirani). Then Matt expounded on the idea, then it got passed off to Tian and Ling Yiu. But then I've never used "quadratic programming", which is just engineer speak for optimization.

I've never seen the point of SPIn or HPD. Yes, I read that paper. Its arguments revolve around a pointer to Box and Tiao (which I didn't follow), a graphical illustration that doesn't tickle my intuitions in figure 1, and a comment that the HPD contains the mode, whereas the central interval can cut off the mode, saying that the high density intervals are "strongly supported by inference". But aren't all 95% intervals equally well supported?

The paper also points out the elephant in the room, which is that "...highest posterior density intervals is that they depend on parameterization." After that, there's an appeal to "routine data analysis", but I'm not even sure what that means.

Also, I think you have to be very careful when mixing medians and means. You can't get the usual MCMC std errors on medians, for instance.

Bob

On Jan 6, 2016, at 5:10 PM, Andrew Gelman notifications@github.com wrote:

Here’s the paper: http://www.stat.columbia.edu/~gelman/research/published/spin.pdf Short answer is that brute force intervals are a bit noisy; some smoothing helps but then things get complicated. The method in our paper works well but it’s a bit too complicated for my taste. I’ve wanted to go back and do more but haven’t done so. Noisy intervals will be a problem, I think, especially given that (a) in some settings we don’t recommend taking a huge number of draws, and (b) often people want 95% intervals (which I think is generally a mistake, but that’s another story). So it could make sense to just implement central intervals for now? A

On Jan 6, 2016, at 5:06 PM, bgoodri notifications@github.com wrote:

With 1000 draws, it is not going to be that intensive to brute force it relative to estimating the model. If we sort the draws, then there is a finite number of intervals that contain at least 95% of the points and then it is just a which.min on the diff of the admissible intervals.

On Wed, Jan 6, 2016 at 5:00 PM, Andrew Gelman notifications@github.com wrote:

I don’t know how easy it is to implement spin in practice. In our paper, Tian and I have an algorithm that has good statistical properties but it’s a bit complicated, actually I don’t really understand exactly what we did!

On Jan 6, 2016, at 4:59 PM, bgoodri notifications@github.com wrote:

The spin is fine with me, although for these models multimodality shouldn't be an issue.

On Wed, Jan 6, 2016 at 4:57 PM, Andrew Gelman notifications@github.com wrote:

Our messages overlapped . . . I actually prefer shortest probability interval (spin) rather than HPD for the reason that if the density is multimodal we’re still returning just a simple interval. A

On Jan 6, 2016, at 4:56 PM, Jonah Gabry notifications@github.com wrote:

If we restrict ourselves to just using posterior quantiles then that could be ok, but if we ofter HPD intervals (not saying we definitely should, but I have the code to do it if we want) then central interval isn't quite right.

— Reply to this email directly or view it on GitHub < https://github.com/stan-dev/rstanarm/issues/45#issuecomment-169475226 .

— Reply to this email directly or view it on GitHub <https://github.com/stan-dev/rstanarm/issues/45#issuecomment-169475536 .

— Reply to this email directly or view it on GitHub < https://github.com/stan-dev/rstanarm/issues/45#issuecomment-169475849>.

— Reply to this email directly or view it on GitHub https://github.com/stan-dev/rstanarm/issues/45#issuecomment-169476279.

— Reply to this email directly or view it on GitHub https://github.com/stan-dev/rstanarm/issues/45#issuecomment-169477645.

— Reply to this email directly or view it on GitHub.

— Reply to this email directly or view it on GitHub https://github.com/stan-dev/rstanarm/issues/45#issuecomment-169487732.

andrewgelman commented 8 years ago

Yes but I really don’t feel I trust that algorithm enough to use in Stan, also it’s not clear we’re doing people any favors by adding one more option, that’s why I was thinking that maybe we’re better off sticking with the central interval for now?

On Jan 6, 2016, at 6:15 PM, bgoodri notifications@github.com wrote:

I think we can just import SPIn::SPIn and use that.

On Wed, Jan 6, 2016 at 5:20 PM, Jonah Gabry notifications@github.com wrote:

Ben, I updated the code above to fix a typo (sel1 --> mark1) and added note about w not being an integer, but now I'm thinking that the code above is just computing an HPD for a unimodal distribution and not necessary the SPIn

— Reply to this email directly or view it on GitHub https://github.com/stan-dev/rstanarm/issues/45#issuecomment-169480694.

— Reply to this email directly or view it on GitHub https://github.com/stan-dev/rstanarm/issues/45#issuecomment-169492558.

jgabry commented 8 years ago

Just so we're 100% on the same page, for the "central interval" we're just using posterior quantiles, right? Right now I have it so that given a user-specified prob argument, we return the interval

a <- (1 - prob) / 2
quantile(theta, c(a, 1 - a))

bgoodri commented 8 years ago

Well, since the SPIn package exists, there is basically no effort needed to call it from bayesian_int(). Or are you saying you would prefer bayesian_int() to output a central interval rather than a spin?

On Wed, Jan 6, 2016 at 6:33 PM, Andrew Gelman notifications@github.com wrote:

I don’t think we’d ever be doing HPD, I think we’d only be doing central or spin. (I’m using “spin” not as a shorthand for the particular method in paper with Tian, but rather to stand for any shortest probability interval. The difference is that in a bimodal distribution, HPD can be two intervals, while spin will just be one.

But for now I’d favor just implementing central_int to keep thing simple.

A

On Jan 6, 2016, at 5:42 PM, bgoodri notifications@github.com wrote:

I guess we need bayesian_int() with an argument indicating whether to do central, spin, HPD, etc. But that doesn't solve the question of defaults.

On Wed, Jan 6, 2016 at 5:16 PM, Jonah Gabry notifications@github.com wrote:

Andrew, regardless of which intervals we end up going with, would you prefer setting the default to something other than 95% or should we go with 95% since it's what people will be expecting? We might not have too many opportunities to mess with the defaults because if we change them many users' R scripts won't reproduce the original output. Of course, using the default arguments in a script is not advisable for this reason, but a lot of people do it.

— Reply to this email directly or view it on GitHub <https://github.com/stan-dev/rstanarm/issues/45#issuecomment-169479873 .

— Reply to this email directly or view it on GitHub < https://github.com/stan-dev/rstanarm/issues/45#issuecomment-169485797>.

— Reply to this email directly or view it on GitHub https://github.com/stan-dev/rstanarm/issues/45#issuecomment-169496921.

jgabry commented 8 years ago

If both are useful we could allow them both (defaulting to central) by providing a "type" argument for specifying the type of interval to compute.

Regarding the name, I guess we should vote. We could go with an "uncertainty"-based name, e.g.,

uncertainty_interval()
uncertainty_int()
uint()

or a Bayes-based name, e.g.,

bayesian_interval()
bayesian_int() 
bayes_interval()
bayes_int() 
bayesint()

or we could just call it intervals(). If we call it central_interval() or something like that then we're restricting it to always only offer just a central interval.

bgoodri commented 8 years ago

I would say use a general function name and a type argument, but I don't have a strong preference about the function name. Maybe some variation on prob_int()?

On Wed, Jan 6, 2016 at 7:35 PM, Jonah Gabry notifications@github.com wrote:

If both are useful we could allow them both (defaulting to central) by providing a "type" argument for specifying the type of interval to compute.

Regarding the name, I guess we should vote. We could go with an "uncertainty"-based name, e.g.,

uncertainty_interval() uncertainty_int() uint()

or a Bayes-based name, e.g.,

bayesian_interval() bayesian_int() bayes_interval() bayes_int() bayesint()

or we could just call it intervals(). If we call it central_interval() or something like that then we're restricting it to always only offer just a central interval.

— Reply to this email directly or view it on GitHub https://github.com/stan-dev/rstanarm/issues/45#issuecomment-169509539.

jgabry commented 8 years ago

I just pushed a prob_int function that defaults to central but had a spin option. But if there any doubts about SPIn the safest thing to do would be to hold off on enabling that option when we release. We can always add an option but removing one is suboptimal.

Also I don't mind the name prob_int, but it's a bit weird if we're going to call them uncertainty intervals rather than probabllity intervals.

On Wednesday, January 6, 2016, bgoodri notifications@github.com wrote:

I would say use a general function name and a type argument, but I don't have a strong preference about the function name. Maybe some variation on prob_int()?

On Wed, Jan 6, 2016 at 7:35 PM, Jonah Gabry <notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com');> wrote:

If both are useful we could allow them both (defaulting to central) by providing a "type" argument for specifying the type of interval to compute.

Regarding the name, I guess we should vote. We could go with an "uncertainty"-based name, e.g.,

uncertainty_interval() uncertainty_int() uint()

or a Bayes-based name, e.g.,

bayesian_interval() bayesian_int() bayes_interval() bayes_int() bayesint()

or we could just call it intervals(). If we call it central_interval() or something like that then we're restricting it to always only offer just a central interval.

— Reply to this email directly or view it on GitHub https://github.com/stan-dev/rstanarm/issues/45#issuecomment-169509539.

— Reply to this email directly or view it on GitHub https://github.com/stan-dev/rstanarm/issues/45#issuecomment-169509862.

andrewgelman commented 8 years ago

I don’t see prob_int as a name. Why not uncertainty_int or simply bayes_int?

Also, yeah, I don’t really trust Spin so much so I’m loath to include it yet.

On Jan 6, 2016, at 9:17 PM, Jonah Gabry notifications@github.com wrote:

I just pushed a prob_int function that defaults to central but had a spin option. But if there any doubts about SPIn the safest thing to do would be to hold off on enabling that option when we release. We can always add an option but removing one is suboptimal.

Also I don't mind the name prob_int, but it's a bit weird if we're going to call them uncertainty intervals rather than probabllity intervals.

On Wednesday, January 6, 2016, bgoodri notifications@github.com wrote:

I would say use a general function name and a type argument, but I don't have a strong preference about the function name. Maybe some variation on prob_int()?

On Wed, Jan 6, 2016 at 7:35 PM, Jonah Gabry <notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com');> wrote:

If both are useful we could allow them both (defaulting to central) by providing a "type" argument for specifying the type of interval to compute.

Regarding the name, I guess we should vote. We could go with an "uncertainty"-based name, e.g.,

uncertainty_interval() uncertainty_int() uint()

or a Bayes-based name, e.g.,

bayesian_interval() bayesian_int() bayes_interval() bayes_int() bayesint()

or we could just call it intervals(). If we call it central_interval() or something like that then we're restricting it to always only offer just a central interval.

— Reply to this email directly or view it on GitHub https://github.com/stan-dev/rstanarm/issues/45#issuecomment-169509539.

— Reply to this email directly or view it on GitHub https://github.com/stan-dev/rstanarm/issues/45#issuecomment-169509862.

— Reply to this email directly or view it on GitHub https://github.com/stan-dev/rstanarm/issues/45#issuecomment-169524769.

jgabry commented 8 years ago

I guess bayes_int is ok (although int makes me thing of integration rather than interval for whatever reason). I don't love any of these names but there's no better option then it's not a huge deal as long as we're ok with not changing it. Ben do you have a preference between those names or another suggestion (I guess besides prob_int).

So I guess then we'll have a type argument with only central as a possibility for now? That's a bit strange but it does leave us the option of adding other possibilities at some future date.

On Wednesday, January 6, 2016, Andrew Gelman notifications@github.com wrote:

I don’t see prob_int as a name. Why not uncertainty_int or simply bayes_int?

Also, yeah, I don’t really trust Spin so much so I’m loath to include it yet.

On Jan 6, 2016, at 9:17 PM, Jonah Gabry <notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com');> wrote:

I just pushed a prob_int function that defaults to central but had a spin option. But if there any doubts about SPIn the safest thing to do would be to hold off on enabling that option when we release. We can always add an option but removing one is suboptimal.

Also I don't mind the name prob_int, but it's a bit weird if we're going to call them uncertainty intervals rather than probabllity intervals.

On Wednesday, January 6, 2016, bgoodri <notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com');> wrote:

I would say use a general function name and a type argument, but I don't have a strong preference about the function name. Maybe some variation on prob_int()?

On Wed, Jan 6, 2016 at 7:35 PM, Jonah Gabry <notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com'); <javascript:_e(%7B%7D,'cvml','notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com');');>> wrote:

If both are useful we could allow them both (defaulting to central) by providing a "type" argument for specifying the type of interval to compute.

Regarding the name, I guess we should vote. We could go with an "uncertainty"-based name, e.g.,

uncertainty_interval() uncertainty_int() uint()

or a Bayes-based name, e.g.,

bayesian_interval() bayesian_int() bayes_interval() bayes_int() bayesint()

or we could just call it intervals(). If we call it central_interval() or something like that then we're restricting it to always only offer just a central interval.

— Reply to this email directly or view it on GitHub < https://github.com/stan-dev/rstanarm/issues/45#issuecomment-169509539>.

— Reply to this email directly or view it on GitHub <https://github.com/stan-dev/rstanarm/issues/45#issuecomment-169509862 .

— Reply to this email directly or view it on GitHub < https://github.com/stan-dev/rstanarm/issues/45#issuecomment-169524769>.

— Reply to this email directly or view it on GitHub https://github.com/stan-dev/rstanarm/issues/45#issuecomment-169534301.

andrewgelman commented 8 years ago

We could do bayes_interval.

On Jan 6, 2016, at 10:42 PM, Jonah Gabry notifications@github.com wrote:

I guess bayes_int is ok (although int makes me thing of integration rather than interval for whatever reason). I don't love any of these names but there's no better option then it's not a huge deal as long as we're ok with not changing it. Ben do you have a preference between those names or another suggestion (I guess besides prob_int).

So I guess then we'll have a type argument with only central as a possibility for now? That's a bit strange but it does leave us the option of adding other possibilities at some future date.

On Wednesday, January 6, 2016, Andrew Gelman notifications@github.com wrote:

I don’t see prob_int as a name. Why not uncertainty_int or simply bayes_int?

Also, yeah, I don’t really trust Spin so much so I’m loath to include it yet.

On Jan 6, 2016, at 9:17 PM, Jonah Gabry <notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com');> wrote:

I just pushed a prob_int function that defaults to central but had a spin option. But if there any doubts about SPIn the safest thing to do would be to hold off on enabling that option when we release. We can always add an option but removing one is suboptimal.

Also I don't mind the name prob_int, but it's a bit weird if we're going to call them uncertainty intervals rather than probabllity intervals.

On Wednesday, January 6, 2016, bgoodri <notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com');> wrote:

I would say use a general function name and a type argument, but I don't have a strong preference about the function name. Maybe some variation on prob_int()?

On Wed, Jan 6, 2016 at 7:35 PM, Jonah Gabry <notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com'); <javascript:_e(%7B%7D,'cvml','notifications@github.com javascript:_e(%7B%7D,'cvml','notifications@github.com');');>> wrote:

If both are useful we could allow them both (defaulting to central) by providing a "type" argument for specifying the type of interval to compute.

Regarding the name, I guess we should vote. We could go with an "uncertainty"-based name, e.g.,

uncertainty_interval() uncertainty_int() uint()

or a Bayes-based name, e.g.,

bayesian_interval() bayesian_int() bayes_interval() bayes_int() bayesint()

or we could just call it intervals(). If we call it central_interval() or something like that then we're restricting it to always only offer just a central interval.

— Reply to this email directly or view it on GitHub < https://github.com/stan-dev/rstanarm/issues/45#issuecomment-169509539>.

— Reply to this email directly or view it on GitHub <https://github.com/stan-dev/rstanarm/issues/45#issuecomment-169509862 .

— Reply to this email directly or view it on GitHub < https://github.com/stan-dev/rstanarm/issues/45#issuecomment-169524769>.

— Reply to this email directly or view it on GitHub https://github.com/stan-dev/rstanarm/issues/45#issuecomment-169534301.

— Reply to this email directly or view it on GitHub https://github.com/stan-dev/rstanarm/issues/45#issuecomment-169540384.

bob-carpenter commented 8 years ago

On Jan 6, 2016, at 7:35 PM, Jonah Gabry notifications@github.com wrote:

If both are useful we could allow them both (defaulting to central) by providing a "type" argument for specifying the type of interval to compute.

Regarding the name, I guess we should vote.

Don't ever take a vote. Solicit advice and make your best call.

Bob

stan-dev / rstanarm

Should we really use confint method for Bayesian credible intervals? #45