mitchelloharawild / distributional

Vectorised distributions for R
https://pkg.mitchelloharawild.com/distributional
GNU General Public License v3.0
94 stars 15 forks source link

implement hdr.dist_default() #59

Closed mjskay closed 3 years ago

mjskay commented 3 years ago

This is in a similar vein to #58. It moves the main implementation of hdr() from hdr.distribution() to hdr.dist_default(), and uses dist_apply() in hdr.distribution() in a similar manner to how hilo.distribution() works.

Previous behavior:

> x = dist_mixture(dist_normal(-3,1), dist_normal(3,1), weights = c(0.5, 0.5))
> xs = c(x, dist_normal(0,1))
> hdr(x)
<list_of<hilo>[1]>
[[1]]
<hilo[2]>
[1] [-3.063688, -2.936312]95 [ 2.936312,  3.063688]95

> hdr(x[[1]])
Error: Objects of type `dist_mixture` are not supported by `hdr()`, you can create a custom `hdr` with `new_hdr()`
* Objects of type `dist_default` are not supported by `hdr()`, you can create a custom `hdr` with `new_hdr()`
Run `rlang::last_error()` to see where the error occurred.
> hdr(xs)
Error in vapply(seq(0.5/n, 1 - 0.5/n, length.out = n), quantile, numeric(1L),  : 
  values must be length 1,
 but FUN(X[[1]]) result is length 2

New behavior:

> x = dist_mixture(dist_normal(-3,1), dist_normal(3,1), weights = c(0.5, 0.5))
> xs = c(x, dist_normal(0,1))
> hdr(x)
<list_of<hilo>[1]>
[[1]]
<hilo[2]>
[1] [-3.063688, -2.936312]95 [ 2.936312,  3.063688]95

> hdr(x[[1]])
<list_of<hilo>[1]>
[[1]]
<hilo[2]>
[1] [-3.063688, -2.936312]95 [ 2.936312,  3.063688]95

> hdr(xs)
<list_of<hilo>[2]>
[[1]]
<hilo[2]>
[1] [-3.063688, -2.936312]95 [ 2.936312,  3.063688]95

[[2]]
<hilo[1]>
[1] [-0.06393336, 0.06393336]95
mjskay commented 3 years ago

Thanks! Makes total sense. I'm not doing much with it at the moment, just setting up some infrastructure in ggdist that will at some point be used with it, but likely not in the next release. I'll hold off on doing anything substantial user-facing with it until you've nailed down a desired format.

mitchelloharawild commented 3 years ago

Great. I expect that the format will be a generalisation of <hilo>, which allows multiple upper<dbl> and lower<dbl> values for a single confidence level<dbl>. I think this will not change the <hilo> type, even though <hilo> could be represented using <hdr> to keep the storage format of <hilo> efficient.

mitchelloharawild commented 3 years ago

You may also be interested in the {gghdr} dev package, which was hacked together at the most recent OzUnconf. The package was developed as a {ggplot2} extension for Rob's {hdrcde} package.

It doesn't use {distributional} for the {hdr} code, but it was planned to use the {hdr} object type once finalised. The code contained in the package is less interesting than the types of high density region visualisations that it provides.

mjskay commented 3 years ago

Cool! Yeah, ggdist currently supports that style of plot with its non-dist stats via the point_interval() function family, which produces tidy data frames of intervals and point estimates, including hdis (though not as a nice vector format like hilo).

e.g. this doesn't give multiple intervals because the default point_interval is median_qi:

data.frame(x = rnorm(4000, c(1,6), c(1,2))) %>%
  ggplot(aes(y = 0, x = x)) + 
  stat_halfeye() + 
  geom_rug(alpha = .02) + 
  scale_color_brewer() + 
  theme_ggdist()

image

But this does:

data.frame(x = rnorm(4000, c(1,6), c(1,2))) %>%
  ggplot(aes(y = 0, x = x)) + 
  stat_halfeye(point_interval = mode_hdi) + 
  geom_rug(alpha = .02) + 
  scale_color_brewer() + 
  theme_ggdist()

image

Something similar to the gghdr plots would be to use mode_hdi with stat_interval (though unfortunately there isn't a great way for this to put points at multiple modes the way gghdr does):

data.frame(x = rnorm(4000, c(1,6), c(1,2))) %>% 
  ggplot(aes(y = 0, x = x)) + 
  stat_interval(point_interval = mode_hdi) + 
  geom_rug(alpha = .02) + 
  scale_color_brewer() + 
  theme_ggdist()

image

While this works with sample data, at the moment with distributional objects and the stat_dist_... family, only medians and quantile intervals are supported. What I have changed recently is making it so that the ggdist::point_interval() functions can take distributional objects. That said, I haven't updated stat_dist_... to actually use the point_interval functions yet, which is when that change will really become visible and useful to folks using distributional objects; I can wait on that until the hdr format is finalized.