Open marco-mariotti opened 3 weeks ago
I suggest removing the fisher dependency and turning off the doctests for now. Or lock an earlier version of fisher
weirdly enough, earlier versions of fisher have the same problem. It must be because some of its dependencies latests versions are causing the problem...? Because fisher is a fringe add-on, used only in one function, I wonder if it can be removed permanently. The performance trade-off, even when used, is likely minimal. It should be easy to replace with a python core/numpy function, something already present as pyranges dep. But I can't look at this for several days
It is OK with me. It is already an optional add-on. I could not find an alternative that was not silly slow
When you have a lot of regions to test the other alternatives are very slow. I guess I could write something myself in C/Cython, but it might not be an optimal use of my time.
If you find something that is fast please tell me
SciPy: 266 µs ± 7.13 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each) Fisher: 280 ns ± 0.721 ns per loop (mean ± std. dev. of 7 runs, 1,000,000 loops each)
Another problem is that SciPy only works on one table at a time but Fisher is vectorized. So the SciPy version is orders of magnitudes slower. I use Fisher exact a lot in my thesis so the difference matters.
I suggest we implement it with scipy as the default and use the fisher library only if it is installed.
I do not agree that fisher is niche, it is a common statistical test and needed for functional enrichment which is ubiquitous in bioinformatics.
The performance difference was seconds compared to tens of minutes IIRC due to SciPy needing Python looping and reading data from random memory addresses.
OK, then fisher package stays. It's already optional, but optional add-ons are required for testing and thus to proceed with any workflows. I think it's healthy that it stays that way. It seems an easy fix to the fisher package. I submitted a PR, should be solved soon.
I am not going to insist. I could make statistics an add-on package that needs to be imported separately. That is not a bad idea actually.
See https://github.com/pyranges/pyranges_1.x/actions/runs/9593389507 This is due to fisher, see https://github.com/brentp/fishers_exact_test/issues/48
Must re-run / adapt eventually, depending on whether the issue is resolved rapidly in fisher