Closed Ben-Cox closed 5 years ago
Ah, yeah I split these into two different functions depending on the type of interval you want:
hdi
("highest density interval") will always return a proper highest-density interval, in the sense that if the distribution is multi-modal it can give you multiple non-contiguous intervals. To do this, HDInterval() is called after passing through density(). If the result of that call is multimodal, it is used, if the result is unimodal, then HDInterval() is called again without passing through density() to get a more precise unimodal interval.
hdci
("highest density continuous interval") will always return only one interval, though in the case of multimodal distributions this is not guaranteed to be a proper HDI (it will still be an interval containing the requested probability mass, just not an HDI). This function corresponds to running HDInterval() directly.
I set up these functions this way because given the tidy data format it is relatively straightforward to return multiple intervals. In this case, I decided that the potential increase in complexity/decrease in usability is worth it, as the alternative is silently returning something that is not an HDI in the case of multi-modality (which is what the default settings of HDInterval does, which one would not know is happening unless one read the documentation of that function). Thus, I set it up so you have to explicitly request a procedure that is not named the same thing as "hdi", since it isn't guaranteed to be the same thing.
I'm closing this for now, but let me know if that didn't answer your question / if you have other issues.
Hi thanks, your answer cleared it up for me. I also looked into the HDI package documentation, which made sense. But I still don't understand how a proper HDI for a parameter that is strictly non-negative can include negative values. After a bit of reading it seems the problem is due to how density() works.
Answers on stack overflow suggested setting the 'from' argument to 0. When I set from=0
and allowSplit=TRUE
inside density, hdi then gives me two intervals that are non-negative (with 0 as the lower bound of the lower interval) with the aforementioned samples. That seems like the proper HDI for a strictly non-negative parameter, but correct me if I'm wrong.
Could the ellipses be included in the arguments to point_hdi()
to let users pass arguments (like from=0
) to density in this situation?
Ah yeah that makes sense.
I think it might be useful to think of some kind of long-term solution to this that doesn't require fiddling with arguments. I wonder if a reasonable solution is simply to restrict the density estimator to the bounds of the sample by default using to
/ from
. That would automatically solve other cases where parameters have bounds without requiring user intervention.
This is now fixed on the meta_geom
branch. I changed the density call in hdi
to use densities trimmed to the range of the sample.
It will be a little bit yet before it filters down to master
but it should make the next CRAN release.
Hey I love this package, its been making dealing with mcmc output really easy, so thanks!
I'm curious why you call density() inside hdi within the point_hdi functions? I have a model currently with some wonky posteriors (but all positive values), median_hdi() is returning two intervals, with the lower limit of the lower interval being negative. When I manually extract the draws and run HDInterval::hdi() I get a sensible HDI with a positive lower bound. When I pass the extracted samples to density() it gives a negative min value, I'm guessing this is why median_hdi() is giving a negative lower limit, is this the desired effect? Is there a reason you don't just call hdi on the raw posterior draws?