richarddmorey / BayesFactor

BayesFactor R package for Bayesian data analysis with common statistical models.
https://richarddmorey.github.io/BayesFactor/
132 stars 49 forks source link

Discrepancy between ttestBF and generalTestBF with unused factor levels #126

Open adamliter opened 5 years ago

adamliter commented 5 years ago

It seems that you get different Bayes Factor values for ttestBF() and generalTestBF() when using a data set with an unused factor level whose underlying integer value is in between two of the actual used factor levels. I think this discrepancy is ultimately due to the fact that generalTestBF() calls lmBF(), which calls reFactorData() on the data set; however, this does not happen for ttestBF().

Here's a way to reproduce this:

my_data <- data.frame(group = c(rep("a", 50), rep("b", 50), rep("c", 50)),
                      score = rnorm(150))
my_data <- my_data[my_data$group != "b", ]
str(my_data)
#> 'data.frame':    100 obs. of  2 variables:
#>  $ group: Factor w/ 3 levels "a","b","c": 1 1 1 1 1 1 1 1 1 1 ...
#>  $ score: num  -1.153 -0.0543 0.9312 -1.3062 0.7689 ...

Note that group is a factor with three levels, since it hasn't been releveled after dropping "b".

Now, when you call ttestBF() vs. generalTestBF(), you get:

ttestBF(formula = score ~ group, data = my_data)
#> Bayes factor analysis
#> --------------
#> [1] Alt., r=0.707 : 0.1833829 ±0%
#> 
#> Against denominator:
#>   Null, mu1-mu2 = 0 
#> ---
#> Bayes factor type: BFindepSample, JZS

generalTestBF(formula = score ~ group, data = my_data)
#> |================================================================================================================| 100%
#> Bayes factor analysis
#> --------------
#> [1] group : 0.2485882 ±0.03%
#> 
#> Against denominator:
#>   Intercept only 
#> ---
#> Bayes factor type: BFlinearModel, JZS

0.1833829 != 0.2485882 👎

Compare this to when you drop "c" instead of "b":

my_data <- data.frame(group = c(rep("a", 50), rep("b", 50), rep("c", 50)),
                      score = rnorm(150))
my_data <- my_data[my_data$group != "c", ]
str(my_data)
#> 'data.frame':    100 obs. of  2 variables:
#>  $ group: Factor w/ 3 levels "a","b","c": 1 1 1 1 1 1 1 1 1 1 ...
#>  $ score: num  -0.613 -1.569 -0.367 1.704 -1.508 ...

You still have an unused factor level, but now you get equivalent results with ttestBF() and generalTestBF():

ttestBF(formula = score ~ group, data = my_data)
#>  Bayes factor analysis
#>  --------------
#>  [1] Alt., r=0.707 : 0.2177451 ±0.03%
#>  
#>  Against denominator:
#>    Null, mu1-mu2 = 0 
#>  ---
#>  Bayes factor type: BFindepSample, JZS
generalTestBF(formula = score ~ group, data = my_data)
#> |================================================================================================================| 100%
#> Bayes factor analysis
#> --------------
#> [1] group : 0.2177451 ±0.03%
#> 
#> Against denominator:
#>   Intercept only 
#> ---
#> Bayes factor type: BFlinearModel, JZS

0.2177451 == 0.2177451 👍

richarddmorey commented 5 years ago

Interesting, I'll check on this. It appears to be a bug at first glance.