NA handling in point_interval

mjskay / tidybayes

Bayesian analysis + tidy data + geoms (R package)

http://mjskay.github.io/tidybayes

GNU General Public License v3.0

727 stars 59 forks source link

NA handling in point_interval #123

Closed awellis closed 6 years ago

awellis commented 6 years ago

In the following example, point_interval() fails because the quantile function is called without the argument na.rm = TRUE

> sample(x = c(rnorm(100), NA), size = 100, replace = TRUE) %>% point_interval()

Error in quantile.default(x, c(lower_prob, upper_prob)) : missing values and NaN's not allowed if 'na.rm' is FALSE

mjskay commented 6 years ago

Ah, good point. Can you say a little more about your use case for passing NAs to point_interval, so that I can get a sense of what a good solution would be?

I ask because when using point_interval on a posterior sample I would not have expected NAs to ever happen, so failing loudly for that use case might be desirable. On the other hand, there might be good rationale for doing something like adding an na.rm argument to point_interval to allow people to circumvent the default behavior when needed, rather than having the default be to ignore NAs. The right solution depends on the expected frequency of different use cases.

mjskay commented 6 years ago

Oh, I should add: another solution, either instead of (or in addition to) an na.rm argument, would be to have the result of point_interval on anything containing NAs to also be NAs for the point and the interval endpoints. Hence the question about use cases: understanding your use case would help me understand what output would make sense.

awellis commented 6 years ago

Hi,

I am using point_interval() to summarise posterior predictions of a rather exotic drift diffusion model (wiener distribution in stan) to fit reaction times and choices. I am basically trying to rewrite this approach using tidybayes.

Some subjects made very few errors, and in some conditions, there were no errors (making this a very poor data set for a DDM). Therefore, there are sometimes no reaction times for error responses.

Nevertheless, I am attempting to fit the model, and my model predicts that these subjects make zero errors in some conditions. That is the source of the problem.

It would be rather useful if I could pass the na.rm argument to the quantile function.

awellis commented 6 years ago

Here is a code example:

preds <- fit$data %>% 
    select(id, condition) %>% 
    add_predicted_draws(fit, negative_rt = TRUE, n = 200) 

preds <- preds %>% 
    mutate(decision = ifelse(.prediction > 0, 1, 0)),
           rt = abs(.prediction))

preds %>% 
    group_by(id, condition, .draw) %>% 
    summarise(prob_correct = mean(decision == 1),
              median_correct = median(rt[decision == 1]),
              median_error = median(rt[decision == 0])) %>% 
    group_by(id, condition) %>% 
    median_qi(.width = c(.50, .80, .95))

mjskay commented 6 years ago

Cool, that makes sense. I've updated point_interval such that it has the following behavior:

by default, if NAs are present, the point summary and interval endpoints are all NA
if you pass na.rm = TRUE, NAs will be stripped before points and intervals are calculated.

For example:

> mean_qi(data.frame(x = c(0:10, NA)))
##    x .lower .upper .width .point .interval
## 1 NA     NA     NA   0.95   mean        qi
> mean_qi(data.frame(x = c(0:10, NA)), na.rm = TRUE)
##   x .lower .upper .width .point .interval
## 1 5   0.25   9.75   0.95   mean        qi

That update is on the master branch on Github now.

awellis commented 6 years ago

Thanks, works perfectly