jasp-stats / jasp-issues

This repository is solely meant for reporting of bugs, feature requests and other issues in JASP.
58 stars 29 forks source link

[Feature Request]: Fisher's exact test in Contingency Table #2362

Closed PeterKlaren closed 6 months ago

PeterKlaren commented 10 months ago

Description

Include Fisher's exact test as an alternative to chi-squared

Purpose

Prevent confusion with Fisher's test on odds ratios

Use-case

No response

Is your feature request related to a problem?

Not applicable

Is your feature request related to a JASP module?

Frequencies

Describe the solution you would like

An option to perform Fisher's exact test (as the ratio of binomial coefficients) in Contingency Tables

Describe alternatives that you have considered

No response

Additional context

The Odds Ratio table containing the odds ratio and "Fisher's exact test" can be confusing to students who are looking for Fisher's test to analyse 2x2 table with low counts.

EJWagenmakers commented 10 months ago

The reason why Fisher's exact test is under 2x2 is because it only defined for the 2x2 case, whereas chi^2 is available for m by k tables more generally. In my opinion, providing two options side by side, one of which works only sometimes, has a larger potential for confusion. Now I do see that the principle of Fisher's exact test can be extended to m by k tables (see references here; https://en.wikipedia.org/wiki/Fisher%27s_exact_test) so if we were to implement that generalization (would be worthwhile, I think) then it would make sense to make it available next to the chi^2 option.

PeterKlaren commented 10 months ago

I have my first-year students in mind: they are classically being taught (and by me as well) that at low counts per cell the chi squared sampling distribution deviates from the theoretical one (same source: https://en.wikipedia.org/wiki/Fisher%27s_exact_test). From an educational point of view, and consistent with JASP's options of alternative tests in other modules (e.g. WMW U-test in the T-Test module), I would be in favour of adding Fisher's exact test to the Frequencies module.

EJWagenmakers commented 10 months ago

But it is present in the module, right? It is just underneath the section specific for 2x2 tables. Maybe I am missing something

PeterKlaren commented 10 months ago

It is in the module, but in the context of odds ratios, not really as an alternative to chi-squared. It also calculates a different p-value than p = (a+b)!(c+d)!(a+c)!(b+d)!/a!b!c!d!n!. For the 2x2 table analysis in the screenshot below I calculate p = 0.053522..., not p = 0.128. But could well be that I am missing something. image

JTPetter commented 10 months ago

@PeterKlaren You are right, the p-value in JASP is different from the one calculated with Fisher's formula -- but it is the same I get in other statistical software packages (SPSS, R, Minitab). This has to do with the fact that it is not computationally feasible to implement the formula as is, as the computational cost of the factorials can become excessive with large count values in the table.

Wikipedia states that "The actual computations as performed by statistical software packages will as a rule differ from those described above, because numerical difficulties may result from the large values taken by the factorials. A simple, somewhat better computational approach relies on a gamma function or log-gamma function, but methods for accurate computation of hypergeometric and binomial probabilities remains an active research area." (https://en.wikipedia.org/wiki/Fisher%27s_exact_test).

PeterKlaren commented 10 months ago

Fisher's exact test is appropriate for small sample sizes/low counts per cell, and with that constraint an exact p-value is quickly calculated. I programmed my TI-84 Plus graphical calculator to calculate p=0.535226066 for the 2x2 table in the screenshot above, no problem.

Perhaps you can let JASP calculate Fisher's exact p-value for 2x2 tables in which at least 2 cells have a count <=5, and no more than 2 cells a count of maximal 15? For a 2x2 matrix {5, 15, 15, 5} my TI-84 calculates p = 0.0174, so computationally it is not a demanding procedure.

Again I am thinking of preventing confusion in students, as Fisher's exact test is mentioned and taught in statistics textbooks and other sources.

EJWagenmakers commented 10 months ago

You would think that the existing software packages would have done something like this, right? I would use logs first, and then exponentiate at the end...and ideally there would be an overflow error that could be caught, after which the approximation could be executed.

EJWagenmakers commented 10 months ago

I would be tempted to take the logs of the factors (a+b)!/a! etc. (so not evaluate the entire numerator and denominator in one go), and then switch to the approximation once this number becomes huge. That leaves a 1/n! term, which could be evaluated using approximations if n is larger than a particular value. But it could happen that the factors (a+b)!/a! etc. can still be evaluated exactly, whereas 1/n! needs an approximation. This would still be better than approximating everything. By the way, "lfactorial" in R keeps computing even for massive n, but maybe this basic function is based on an approximation? It would be good to check with your example.

Kucharssim commented 10 months ago

Sorry that it took me while to get to this.

I am still a little bit confused. @PeterKlaren the formula you wrote gives the exact hypergeometric probability of observing this exact joint distribution of the data with the observed marginal counts fixed, assuming the null hypothesis is true. However, that is not a p-value: To obtain the p-value we need to sum the probabilities of observing such joint distribution or more extreme.

When I use the example from Wikipedia for calculating the p-value (https://en.wikipedia.org/wiki/Fisher%27s_exact_test#p-value_tests), I get the same answer in JASP.

Is there a use-case for reporting not the p-values but only the probability of observing the exact data? I think we could add it but I am not sure if that would not introduce even more confusion.

I agree that adding Fisher's exact test under Odds ratio is a bit odd though. Perhaps we can rename that section to 2x2 tables and have Odds ratio and Fisher's exact test as clearly distinct options within that. Do you think that would make it easier for students?

PeterKlaren commented 10 months ago

You are correct, Simon. My formula and procedure (as it occurs in some textbooks, hence my and students' confusion) indeed gives an exact probability of a particular distribution of counts with fixed marginals. This was my oversight,

Calculating and summing p-values of hypothetical more extreme 2x2 tables gives me a (one-sided?) p-value. For the example in my screenshot I calculate p = 0.06457..., which would give a two-sided p-value of p = 0.129..., consistent with JASP's output.

Your suggestion of distinguishing between odds ratio and Fisher's is logical and a good one. I think the most confusing thing in the table Odds Ratio is to have a column and a row labeled "odds ratio". This also is not explained in the help screen under the blue information button.

tomtomme commented 7 months ago

@PeterKlaren So the separation of Odds and Fishers exact test could be solved in one go, if we implement https://github.com/jasp-stats/jasp-issues/issues/99 Would you agree? I added #99 to this list: https://github.com/jasp-stats/jasp-issues/issues/1364

And the confusion about the double "odds ratio" in the table must certainly be addressed via the help docs, tracked here: https://github.com/jasp-stats/jasp-issues/issues/2529

Would you agree then, that this one could be closed as duplicate, and that we track progress in the "bigger picture" issues?

(related issue: https://github.com/jasp-stats/jasp-issues/issues/946)

PeterKlaren commented 6 months ago

Agree! Thanks, Thomas.

PerPalmgren commented 1 month ago

I hope this will be resolved in the future! All the best Per