richarddmorey / BayesFactor

BayesFactor R package for Bayesian data analysis with common statistical models.
https://richarddmorey.github.io/BayesFactor/
132 stars 49 forks source link

feature request: allow data with `NA`s present #143

Closed IndrajeetPatil closed 4 years ago

IndrajeetPatil commented 4 years ago

Currently, none of the functions work if there are NAs present in the data, and I was wondering if it is possible that BayesFactor can internally handle missing values. Often, the solution is as simple as stats::na.omit, but the removal of NAs becomes tricky with repeated measures designs and it will be nice if the package can internally handle this and not expect users to do this.

library(jmv)
data("bugs")
library(BayesFactor)
#> Loading required package: coda
#> Loading required package: Matrix
#> ************
#> Welcome to BayesFactor 0.9.12-4.2. If you have questions, please contact Richard Morey (richarddmorey@gmail.com).
#> 
#> Type BFManual() to open the manual.
#> ************

ttestBF(bugs$LDLF, bugs$LDHF, paired = TRUE)
#> Error in ttestBF(bugs$LDLF, bugs$LDHF, paired = TRUE): x or y must not contain missing or infinite values.

library(ggplot2)
data("msleep")

anovaBF(formula = brainwt ~ vore, data = as.data.frame(msleep))
#> Error in checkFormula(formula, data, analysis = "anova"): Dependent variable must not contain missing or infinite values.

Just a thought. If you think how to remove NAs is a judgment call the users should be making and the package shouldn't have to shoulder this responsibility, I completely understand.

richarddmorey commented 4 years ago

We discussed this extensively early on, and decided that we didn't want the package to do anything regarding missing data because the potential for unexpected behavior was too great. We could imagine things to do in specific situations, but nothing general enough.

We figured the best approach was to leave it up to the user.

IndrajeetPatil commented 3 years ago

Interestingly, BayesFactor does seem to do this for some tests:

set.seed(123)
library(magrittr)
library(BayesFactor)
library(ggplot2)

# with the raw dataframe
correlationBF(msleep$brainwt, msleep$sleep_rem)
#> Ignored 35 rows containing missing observations.
#> Bayes factor analysis
#> --------------
#> [1] Alt., r=0.333 : 0.9056885 ±0%
#> 
#> Against denominator:
#>   Null, rho = 0 
#> ---
#> Bayes factor type: BFcorrelation, Jeffreys-beta*

# create a dataframe with NAs omitted
df <- 
  dplyr::select(msleep, brainwt, sleep_rem) %>%
  tidyr::drop_na()

# are the results same?
correlationBF(df$brainwt, df$sleep_rem)
#> Bayes factor analysis
#> --------------
#> [1] Alt., r=0.333 : 0.9056885 ±0%
#> 
#> Against denominator:
#>   Null, rho = 0 
#> ---
#> Bayes factor type: BFcorrelation, Jeffreys-beta*

Created on 2020-12-04 by the reprex package (v0.3.0.9001)

richarddmorey commented 3 years ago

Yes, this would be down to whether there's a default way that R handles the computation of the underlying test statistic, if the underlying test statistic is what goes into the analysis.