karoliskoncevicius / matrixTests

R package for computing multiple hypothesis tests on rows/columns of a matrix or a data.frame
https://cran.r-project.org/web/packages/matrixTests/index.html
36 stars 5 forks source link

Add Fisher's exact test #11

Open karoliskoncevicius opened 4 years ago

karoliskoncevicius commented 4 years ago

Implement Fisher's exact test for a 2x2 case.

Input can be a pair of matrices. For a row-wise case each row of both matrices will have 2 unique levels (logical, factor, character or even numeric).

mdshw5 commented 4 years ago

I'd just like to add my support for this feature. This is something I could use in a project I'm currently working on.

karoliskoncevicius commented 4 years ago

Thanks for adding your voice here.

Can you maybe elaborate a bit about the structure of your project?

This Fisher's test got delayed a bit mostly because I cannot decide on the interface that would be suitable for every potential use case.

For my case I would make a function that gets two matrices as an input and both matrices would:

  1. Have the same number of columns
  2. Each row would have at most 2 unique levels in it (i.e. TRUE/FALSE)

Then the test row_fisher_exact(mat1, mat2) would do a Fisher's exact test on each row between those two matrices. Would that cover your needs?

mdshw5 commented 4 years ago

@KKPMW What you described would exactly fit my needs. I think that's the most generalized way to set up a matrix-oriented application of the 2x2 Fisher's test. I like the idea of enforcing at most 2 levels, and the only tricky part there is what you accept as data types. Definitely logical types make sense, and I think factors could make sense as well. Anything else could be a lower priority as the test is strictly categorical and it's not clear that character and numeric values are categorical or continuous measures.

karoliskoncevicius commented 4 years ago

@mdshw5 I took a look at Fisher's test again and I remember now why I postponed it at that time.

Short story: the function uses maximum likelihood estimates for odds ratio and confidence intervals. To do this computationally R calls the uniroot() function, which basically is looking for a minimum of a given function. At the current stage it seems to be that it would be very hard to fasten up this approach...

But there are some things that can be computed very quickly. One such thing is p-values under OR=1 NULL. If those would be enough for your project, I could create a dev branch with this kind of handicapped Fisher's test for now.

mdshw5 commented 4 years ago

Thanks for looking into this! In fact p-values under the OR=1 null hypothesis would be useful for my project. I’d be glad to test out any implementation.

On Apr 30, 2020, at 2:38 PM, Karolis Koncevičius notifications@github.com wrote:

 @mdshw5 I took a look at Fisher's test again and I remember now why I postponed it at that time.

Short story: the function uses maximum likelihood estimates for odds ratio and confidence intervals. To do this computationally R cals the uniroot() function, which basically is looking for a minimum of a given function. At the current stage it seems to be that it would be very hard to fasten up this approach...

But there are some things that can be computed very quickly. One such thing is p-values under OR=1 NULL. If those would be enough for your project, I could create a dev branch with this kind of handicapped Fisher's test for now.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub, or unsubscribe.