tidyverse / ggplot2

An implementation of the Grammar of Graphics in R
https://ggplot2.tidyverse.org
Other
6.39k stars 2k forks source link

Adding bootstrap customization for stat_summary(fun.data="mean_cl_boot") #5885

Closed erinnacland closed 2 months ago

erinnacland commented 2 months ago

When using stat_summary(fun.data = "mean_cl_boot"), bootstrapped means and confidence intervals are produced. The bootstraps are set to 1000, this leads to the means and CIs slightly changing between runs. Below are two runs of the same code, each has slightly different estimates:

Rplot05 Rplot06

Reproducible code:

ggplot(iris, aes(x=Species, y=Sepal.Length, fill=Species,)) +
  stat_summary(fun.data="mean_cl_boot") 

This can lead to meaningful changes in estimates depending on the data used and how large the changes are between iterations.

Is it possible to add a feature where you can edit the number of bootstraps so they can be increased, making results more stable?

teunbrand commented 2 months ago

You can already pass additional arguments to the summary function by passing a list to the fun.args argument. As mean_cl_boot() is just a wrapper for Hmisc::smean.cl.boot(), you can increase B to get a more stable estimate like this:

library(ggplot2)
ggplot(iris, aes(x=Species, y=Sepal.Length, fill=Species,)) +
  stat_summary(fun.data = "mean_cl_boot", fun.args = list(B = 10e3)) 

Alternatively, you can use fun.data = ~mean_cl_boot(.x, B = 10e3) to achieve the same outcome. In addition, you can use set.seed() to fix the effects of the random number generator.

As there are already methods to change the number of bootstraps, I'll close this issue.