richarddmorey / BayesFactor

BayesFactor R package for Bayesian data analysis with common statistical models.
https://richarddmorey.github.io/BayesFactor/
131 stars 48 forks source link

Big number when running Bayesian analysis of contingency table #144

Closed ivan-marroquin closed 3 years ago

ivan-marroquin commented 3 years ago

Hi all,

I am using R 3.6.3 64 bits and BayesFactor 0.9.12-4.2 on a windows machine.

I collected data and built a contingency table, and I would like to assess how strongly the two variables are associated. Below, you see a copy of the script: library(BayesFactor)

labels_neurons <- c('N16','N17', 'N21', 'N22', 'N23') no_reservoir <- c(34, 215, 39, 57, 52) reservoir <- c(43, 49, 258, 155, 44)

combine arrays into a data frame:

neurons <- rep(labels_neurons, each= 2) rocks <- rep(c('no_reservoir', 'reservoir'), times= length(som_neurons)) freq <- c(rbind(no_reservoir, reservoir))

generate data frame

input_data <- data.frame(neurons, rocks, freq)

convert data frame into a contingency table: the two variables are neurons and rock types (i.e., no_reservoir and reservoir)

input_table <- xtabs(freq ~ neurons + rocks, data= input_data)

in this example, the number of data samples is fixed and everything else # is random

bayes.result <- contingencyTableBF(contingency_table, sampleType= "jointMulti")

Note that the output is: Bayes factor analysis

[1] Non-indep. (a=1) : 2.299261e+65 ±0%

Against denominator: Null, independence, a = 1

Bayes factor type: BFcontingencyTable, joint multinomial

My question is, does make sense to obtain such astronomical result of 2.299261e+65? What does it mean?

Many thanks for your help,

Ivan

richarddmorey commented 3 years ago

Hi Ivan,

In the background, the BayesFactor package is estimating the logarithm of the Bayes factor, and then when displaying it, shows the exponentiated version. This is analgous to finding an extremely low p value, and should be interpreted with the same caution. I note that the p value here is less than 2.2e-16, so you get the same very small p value.

You can use R to see how using the logarithm of the p value instead can give you a ridiculously low number:

X2   = chisq.test(input_table)
logp = pchisq(X2$statistic, X2$parameter, lower.tail = FALSE, log.p = TRUE)
BayesFactor:::expString(logp)

(expString prints the value in scientific notation, given the logarithm of a value)

ivan-marroquin commented 3 years ago

Hi Richard,

Many thanks for your prompt answer. I would like to have some clarifications. Since Bayes factor is 2.299261e+65, it suggests that this measurement can be considered as "overly optimistic" that the variables in the contingency table are associated. Am I correct?

I tried with a different data set, in which I analyzed the result of different classification against the presence of reservoir/non-reservoir rocks. Here are the results: First classification --> Bayes factor 7184.99 Second classification --> Bayes factor 3838.2 Third classification --> Bayes factor 3271574 Fourth classification --> Bayes factor 3.7076e+13

How these computed factors can be interpreted? is it normal to expect very large values? From these 5 results, which one would make more sense (the lowest or the highest one)?

Many thanks for your collaboration,

Ivan

richarddmorey commented 3 years ago

I'm not sure what you mean about "overly optimistic" or "classification" without knowing more about your particular problem. These Bayes factors are all very high, suggesting clear evidence any way you look at it.

ivan-marroquin commented 3 years ago

Hi Richard,

Sorry for the confusion. Let me explain first what I meant by "overly optimistic". I understood that Bayes factor captures the odds against the null hypothesis. I assume that in my case that the null hypothesis is that classification result is independent from presence of reservoir/non-reservoir rocks. Because, I got such astronomical number for Bayes factor. I thought that I could describe it as being "overly optimistic". If you have a better way to explain such big value for Bayes factor, please let me know.

With respect on how I am using Bayes factor. Here is a short explanation. On a different data set, I performed several machine learning classification analyses and I want to understand how these classification results are associated with presence of reservoir and non-reservoir rocks. Thus, a contingency table was generated on each classification result: First classification & reservoir/non-reservoir rocks --> Bayes factor 7184.99 Second classification & reservoir/non-reservoir rocks --> Bayes factor 3838.2 Third classification & reservoir/non-reservoir rocks --> Bayes factor 3271574 Fourth classification & reservoir/non-reservoir rocks --> Bayes factor 3.7076e+13

And now, I would like to know how to interpret Bayes factor high values. So, I asked if it is normal to expect such large values? If so, why Bayes factor points to best result? Should I choose the result with lowest Bayes factor, or the highest? What would you suggest?

Many thanks,

Ivan

richarddmorey commented 3 years ago

In that case, I would report that for all classifications, the Bayes factor is greater than 3800. The exact numbers don't matter much, particularly when they get this large. A test statistic (whether it is a p value or Bayes factor) is just a check on whether the results you're showing in your sample (which seem to indicate a lack of independence) could be due to chance. Mere chance variability seems like only a remote possibility here.

ivan-marroquin commented 3 years ago

Hi Richard,

Many thanks for your help!

Ivan