Numba energy permutation test

multimeric commented 3 years ago

Note: this builds on #27, and changes from that branch will appear here until that is merged.

This re-implements most of the energy distance functions and permutation tests using numba. This provides significant performance improvements. I have some benchmarks below, which compare numba to pure Python (note: this isn't comparing numba to original code that used numpy tricks, it's comparing my changes with and without the JIT). The results suggest that numba improves performance for any number of permutations above 250. I expect this will also be true of multiple different permutation tests in the same program.

And with slightly higher limits:

However, the costs are:

Some ugly numba workarounds, like re-implementing the permutation function using nested loops
We lose the ability to pass in arbitrary average functions, the average parameter is now a string which is either mean or median
We lose the use of the RandomState object, and have to rely on only np.random.seed()
Startup costs associated with JIT compiler. Thus, for less than 250 permutations, the JIT compilation slows down the task.

multimeric commented 3 years ago

I would appreciate some help with these doctests, I can't work out how to run them individually. Nor can I work out why they are changing from -8.0 to -7.999999999999999.

vnmabus commented 3 years ago

I think we should test the performance against the Numpy version. Numpy is VERY fast, and the Numba rewrite loses some functionality and is harder to maintain, so I would want to see a significant improvement before the rewrite (also, we may even want to have BOTH versions, to preserve the old functionality).

multimeric commented 3 years ago

Good point:

numba

What's weird is that my downstream application definitely sped up using numba, but what I'm doing is slightly different and slower than a homogeneity test. Anyway I'll just port my numba code over to that.

vnmabus commented 3 years ago

Check that you are compiling numba in nopython mode. Also, what happens with numpy at 0? That behaviour looks like numba jit compilation instead of numpy.

multimeric commented 3 years ago

I'm was using njit in all cases. The point here is that numba is fast, it's just that numpy is equally fast, without the compilation step. The first point must be an error somehow. Each timepoint booted from scratch so it can't have been a compilation step or it would be in all timepoints.

vnmabus commented 3 years ago

Have you excluded the compilation step from the numba timings?

multimeric commented 3 years ago

No I haven't, but that's kind of the point, isn't it? We want to find the problem size at which the numba speedup (if it exists) wins against numpy despite the flagfall cost of compilation. And it seems like that never happens.

vnmabus commented 3 years ago

No, because the compilation only happens once, while the function may be called multiple times.

vnmabus / dcor

Numba energy permutation test #28