mjskay / ggdist

Visualizations of distributions and uncertainty
https://mjskay.github.io/ggdist/
GNU General Public License v3.0
848 stars 26 forks source link

curve_interval: a numeric/dataframe method? #95

Closed DominiqueMakowski closed 1 year ago

DominiqueMakowski commented 3 years ago

Hi! We're currently trying to provide support for curve_interval in easystats (https://github.com/easystats/bayestestR/issues/455). Due to how our API is organized, it would be helpful to have a method that works on "regular" draws matrices (e.g., such as one from as.data.frame(some_rstanarm_model)), rather than grouped dataframes, since we don't have dplyr/tibble as dependencies.

Is it something that is possible to implement? Here's a reproduction of the first example in https://mjskay.github.io/ggdist/reference/curve_interval.html, but using bayestestR's formats of inputs/outputs:

library(bayestestR)
library(ggplot2)
library(ggdist)
#> Attaching package: 'ggdist'
#> The following object is masked from 'package:bayestestR':
#> 
#>     hdi

# Generate data =============================================
k = 11 # number of curves (iterations)
n = 201 # number of rows
data <- data.frame(x = seq(-15,15,length.out = n))

# Simulate iterations as new columns
for(i in 1:k) {
  data[paste0("iter_", i)] <- dnorm(data$x, seq(-5,5, length.out = k)[i], 3)
}

# Note: first, we need to transpose the data to have iters as rows
iters <- datawizard::data_transpose(data[paste0("iter_", 1:k)])

# Compute Median
data$Median <- point_estimate(iters)[["Median"]]

# Compute Credible Intervals ================================

# Compute ETI (default type of CI)
data[c("ETI_low", "ETI_high")] <- eti(iters, ci = 0.5)[c("CI_low", "CI_high")]

# Compute CWI
# ggdist::curve_interval(reshape_iterations(data), iter_value .width = c(.5))

# Visualization =============================================
ggplot(data, aes(x = x, y = Median)) +
  geom_ribbon(aes(ymin = ETI_low, ymax = ETI_high), fill = "red", alpha = 0.3) +
  geom_line(size = 1) +
  geom_line(data = reshape_iterations(data),
            aes(y = iter_value, group = iter_group),
            alpha = 0.3)

Created on 2021-08-18 by the reprex package (v2.0.1)

I would like to compute the curvewise interval without, if possible, reshaping to long + grouping the iteration dataframe. As always, thanks :)

mjskay commented 3 years ago

Yeah I think something like this is definitely doable. I was already thinking about making curve_interval generic so that posterior::rvar objects could be passed to it. Internally since rvars just wrap an array where the first dimension is draws any changes to support that would also make it easy to have an implementation of the generic for matrices.

mjskay commented 1 year ago

I added a pretty simple implementation of curve_interval() for rvars and for matrices (for the latter, draws must be the first dimension). In that implementation, it assumes you want intervals for all variables joint with each other (i.e. you can't set .along), since if you want to set .along you might as well put it into a data frame format anyway. Not sure if you're still wanting to use this somewhere, but if you are and it doesn't work for you, let me know.