mjskay / tidybayes

Bayesian analysis + tidy data + geoms (R package)
http://mjskay.github.io/tidybayes
GNU General Public License v3.0
712 stars 59 forks source link

`stat_steplineribbon()` for step curves? #249

Closed ASKurz closed 4 years ago

ASKurz commented 4 years ago

Would you consider support for step functions as in a Kaplan-Meier (KM) survival curve? Say I fit a KM-type model with brms::brm() on continuous-time-to-event data with t cases. The output from posterior_samples() would look something like this.

library(tidyverse)
library(brms)
library(tidybayes)
library(pammtools)  # for geom_stepribbon()

t <- 30

# make the samples
posterior_samples <-
  tibble(parameter = str_c("b_time", 1:t),
         Estimate = seq(from = -4, to = 0, length.out = t),
         Est.Error = seq(from = 0.9, to = 1.8, length.out = t),
         samples = map2(Estimate, Est.Error, rnorm, n = 4000)) %>% 
  unnest(samples) %>% 
  mutate(iter = rep(1:4000, times = t)) %>% 
  select(iter, parameter, samples) %>% 
  pivot_wider(names_from = parameter,
              values_from = samples)

Using posterior_samples() as the starting point, the workflow to get those samples in a format ready for plotting a survival curve with interval bands is something like this.

# wrangle
posterior_samples %>% 
  mutate(b_time0 = -Inf) %>% 
  select(b_time0, b_time1:str_c("b_time", t), iter) %>% 
  set_names(c(0:t, "iter")) %>% 
  pivot_longer(-iter,
               names_to = "interval",
               names_ptypes = list(interval = double())) %>% 
  group_by(iter) %>% 
  mutate(survival = cumprod(1 - inv_logit_scaled(value))) %>% 
  group_by(interval) %>% 
  # I can finally summarize!
  median_qi(survival) %>% 

  # now plot
  ggplot(aes(x = interval)) +
  geom_stepribbon(aes(ymin = .lower, ymax = .upper),
                  alpha = 1/3) +
  geom_step(aes(y = survival))

It’d be sweet if tidybayes offered something that allowed me to skip the median_qi() step and pump those MCMC draws into a geom that combined the sensibilities of my geom_stepribbon() + geom_step() combo. Maybe call it stat_steplineribbon().

mjskay commented 4 years ago

cool yeah, good idea! I think I would probably make it an option to the existing lineribbon geom (like step = TRUE). Would be an easy thing to add... possibly I don't have time until further into May when I switch over to summer mode, but I'll add this issue to the next release milestone so I remember to look at it this summer :)

(also would totally accept a PR that does this)

ASKurz commented 4 years ago

A simple step = TRUE option would be a dream! As to the PR, I'd love to but I don't think my programming skills are ready to help package development, yet. When I look at this, I'm baffled.

mjskay commented 4 years ago

No worries! Yeah, the internals for geoms are especially intimidating, I don't blame you --- it took me a long time to work up the courage to try building one.

I needed a break from some other stuff and was feeling code-y so I put together a solution. Basically there's now a step argument that's default is FALSE, but can be TRUE, "mid", "hv", or "vh" (where the last three work the same as the direction argument to geom_step). TRUE is just an alias for "mid", as when you use "hv" or "vh" either the leftmost or rightmost interval on the ribbon is not visible (it ends up with a width of 0). This departs from the geom_step default (which is "hv"), but it seems to me that a default that removes an uncertainty interval is not great. As someone actually using these things I would be curious your thoughts though.

You can see the difference here (points shown for reference):

step = TRUE / step = "mid" image

step = "hv" (what geom_step does by default) image

step = "vh" image

Anyway, this change means all the lineribbon geoms have a "step" argument now (geom_lineribbon, stat_lineribbon, and stat_dist_lineribbon). It's on the dev branch if you wanna check it out.

ASKurz commented 4 years ago

Dude. This looks sweet.

I see what you mean about not wanting to lose an uncertainty interval with hv or vh. I’ve become accustomed to using default hv and tacking on an extra level for x that duplicates the corresponding y-values.

Hey, do I execute something other than devtools::install_github("mjskay/tidybayes") to get the dev version? I’ve reinstalled tidybayes using that code and am getting the warning: “Ignoring unknown parameters: step” when I try to play with step = TRUE.

mjskay commented 4 years ago

Yeah it's on the dev branch currently (haven't merged to master because of Travis issues but it's the only change on there and the tests pass, so we're not talking bleeding edge really). I think the syntax is install_github("mjskay/tidybayes@dev")?

ASKurz commented 4 years ago

Works like a dream!

mjskay commented 4 years ago

Yay! 🙂

ASKurz commented 4 years ago

First use out in the wild: https://twitter.com/SolomonKurz/status/1256236803952164864