mjskay / tidybayes

Bayesian analysis + tidy data + geoms (R package)
http://mjskay.github.io/tidybayes
GNU General Public License v3.0
712 stars 59 forks source link

spread_draws converts characters to int #270

Closed elisafilevich closed 3 years ago

elisafilevich commented 4 years ago

Based on one of the vignettes, I'm trying to plot posterior draws using stat_slab. I was getting the following error, and I couldn't figure it out:

Warning message: Computation failed in stat_slab(): need at least 2 points to select a bandwidth automatically

I think the problem is that spread_draws automatically converts number characters to integers, when it can. See the two datasets below, that differ only in whether subject is coded as "1"/"2" or "X"/"Y":

n = 10
n_condition = 5
ABC_letters =
  tibble(
    condition = rep(c("A","B","C","D","E"), n),
    response = rnorm(n * 5, c(0,1,2,1,-1), 0.5),
    treatment = rnorm(n * 5, c(0,1,2,1,-1), 0.5),
    subject = c(rep("X",(n_condition*n)/2),rep("Y",(n_condition*n)/2))
  )

ABC_numbers =
  tibble(
    condition = rep(c("A","B","C","D","E"), n),
    response = rnorm(n * 5, c(0,1,2,1,-1), 0.5),
    treatment = rnorm(n * 5, c(0,1,2,1,-1), 0.5),
    subject = c(rep("1",(n_condition*n)/2),rep("2",(n_condition*n)/2))
  )

I run the two models, in exactly the same way:

m_letters = brm(
  response ~ treatment + (1|condition) + (1|subject),
  data = ABC_letters,
  cores = 4, chains = 1,
  iter = 500, warmup = 50
)

m_numbers = brm(
  response ~ treatment + (1|condition) + (1|subject),
  data = ABC_numbers,
  cores = 4, chains = 1,
  iter = 500, warmup = 50
)

I can plot the first model just fine

# This works
m_letters %>%
  spread_draws(b_treatment, r_subject[subject,]) %>% 
  mutate(subject_estimate = b_treatment + r_subject) %>% #print()
  ggplot(aes(y = subject, x = subject_estimate)) +
  stat_slab()

But not the second. It does however, when I add the line commented below that transforms subject back to char.

# does not work
m_numbers %>%
  spread_draws(b_treatment, r_subject[subject,]) %>% 
  mutate(subject_estimate = b_treatment + r_subject) %>% 
  #mutate(subject = as.character(subject)) %>%  
  ggplot(aes(y = subject, x = subject_estimate)) +
  stat_slab()

I had this problem in my code even though in my (real) data I has specified subjectNumber to be both a character and a factor. If this behaviour is intended, I'm missing something (which might very well be, some parts of R remain a mystery to me). If it is intended, it would be a good idea to include a warning and/or improve the error message.

mjskay commented 3 years ago

Sorry for the delay! Ah yes, the issue is with orientation detection.

stat_slab() attempts to figure out whether you want to plot the densities horizontally or vertically based on the data types of the variables mapped onto the x and y aesthetics. In your second example it incorrectly thinks you want vertical slabs because y is numeric and not a factor. Then it attempts to calculate densities with too few observations and breaks.

This is why converting back to a factor fixes the problem, as then y is discrete and it correctly determines that the orientation should be horizontal.

You can force the orientation to be horizontal by passing orientation = "horizontal" manually:

m_numbers %>%
  spread_draws(b_treatment, r_subject[subject,]) %>% 
  mutate(subject_estimate = b_treatment + r_subject) %>% 
  #mutate(subject = as.character(subject)) %>%  
  ggplot(aes(y = subject, x = subject_estimate)) +
  stat_slab(orientation = "horizontal")

image

Hope that helps!