Open Suor opened 8 years ago
I also explored meta-analysis optimization opportunities. Here are options I see:
TE_fixed
and TE_random
for permutation analysis, other fields, like confidence intervals, etc, are never used. This is easy to implement, but cuts time only by third.I would like to delay 3 and 4 as far as possible as both make future modifications harder.
Also consider the numba just in time compiler. It is super easy to implement: just add a single decorator. It may not be able to JIT all the code. There is also a nogil=True
option with numba.jit
which release the GIL, so you can use threading for concurrency.
And another option is to reduce the number of permutations and then use an extreme value distribution to calculate a p-value. See this paper
Already tried numba, so far it only makes things slower. nogil=True
doesn't work with high-level code using pandas and numpy, I tried it - code becomes even uglier than in Cython. Also, GIL actually makes things faster not slower, so there is no point. GIL prevents using threads for calculation, but processes could be used just fine.
Analysis with permutations became extremely slow. Dexter gave me a task to optimize it. The most visible issue was slow fold changes.
I optimized fold changes here and here. I believe meta-analysis should be a bottleneck now, @idrdex please confirm or deny.
The other issue is that I can't really test this with
mygene_filter
as I don't have CSV file you are using (dengue_perm_analysis.csv
) in analysis. Please supply it or give me a code to generate it.