mjskay / tidybayes

Bayesian analysis + tidy data + geoms (R package)
http://mjskay.github.io/tidybayes
GNU General Public License v3.0
718 stars 59 forks source link

point_interval with hdi(density(x), credmass=.width) #165

Closed Ben-Cox closed 5 years ago

Ben-Cox commented 5 years ago

Hey I love this package, its been making dealing with mcmc output really easy, so thanks!

I'm curious why you call density() inside hdi within the point_hdi functions? I have a model currently with some wonky posteriors (but all positive values), median_hdi() is returning two intervals, with the lower limit of the lower interval being negative. When I manually extract the draws and run HDInterval::hdi() I get a sensible HDI with a positive lower bound. When I pass the extracted samples to density() it gives a negative min value, I'm guessing this is why median_hdi() is giving a negative lower limit, is this the desired effect? Is there a reason you don't just call hdi on the raw posterior draws?

mjskay commented 5 years ago

Ah, yeah I split these into two different functions depending on the type of interval you want:

I set up these functions this way because given the tidy data format it is relatively straightforward to return multiple intervals. In this case, I decided that the potential increase in complexity/decrease in usability is worth it, as the alternative is silently returning something that is not an HDI in the case of multi-modality (which is what the default settings of HDInterval does, which one would not know is happening unless one read the documentation of that function). Thus, I set it up so you have to explicitly request a procedure that is not named the same thing as "hdi", since it isn't guaranteed to be the same thing.

mjskay commented 5 years ago

I'm closing this for now, but let me know if that didn't answer your question / if you have other issues.

Ben-Cox commented 5 years ago

Hi thanks, your answer cleared it up for me. I also looked into the HDI package documentation, which made sense. But I still don't understand how a proper HDI for a parameter that is strictly non-negative can include negative values. After a bit of reading it seems the problem is due to how density() works. Answers on stack overflow suggested setting the 'from' argument to 0. When I set from=0 and allowSplit=TRUE inside density, hdi then gives me two intervals that are non-negative (with 0 as the lower bound of the lower interval) with the aforementioned samples. That seems like the proper HDI for a strictly non-negative parameter, but correct me if I'm wrong. Could the ellipses be included in the arguments to point_hdi() to let users pass arguments (like from=0) to density in this situation?

mjskay commented 5 years ago

Ah yeah that makes sense.

I think it might be useful to think of some kind of long-term solution to this that doesn't require fiddling with arguments. I wonder if a reasonable solution is simply to restrict the density estimator to the bounds of the sample by default using to / from. That would automatically solve other cases where parameters have bounds without requiring user intervention.

mjskay commented 5 years ago

This is now fixed on the meta_geom branch. I changed the density call in hdi to use densities trimmed to the range of the sample.

It will be a little bit yet before it filters down to master but it should make the next CRAN release.