Compiling miepython using numba

pscicluna commented 3 years ago

I've been looking at miepython and thinking it looked really useful, but of course mie calculations are time consuming and having everything in pure python instead of a compiled language exacerbates the issue. I thought it might be nice to have somewhat the best of both worlds using numba, as you've only used python and numpy so far. This PR is not complete, but I thought I would open it and see what you think about the idea before I get too far into the work.

I managed to achieve roughly a factor of 15 improvement in speed with the changes so far, and I believe more is possible with further optimisation. I hope this is useful to you, please let me know what you think. I'm happy to keep working on it if you think it's useful. Whatever questions you have, just let me know.

scottprahl commented 3 years ago

Nice work. The 15X increase in speed is great. The implementation seems maintainable. I think you should keep going.

It would be nice to have a target in the makefile for performance profiling.

I prefer early return from functions. Does numba require a single return statement for jitting? Or are these changes just your preference?

Have you verified that all the tests pass?

pscicluna commented 3 years ago

Sorry for the slow reply! Thanks, I'll keep working on this whenever I have time.

Numba doesn't require a single return, but I have found it makes it easier to debug because it makes it easier for it to determine the return signature. I was planning to go ahead with single returns until I have as much compiling to nopython mode as possible, then try to re-introduce early returns. That might require a certain number of explicit signatures to be declared, which will probably increase the maintenance overhead.

I'll check the tests once I have the chance to get back to this.

jbecca commented 3 years ago

If there is any way to help move this along, let me know. I could pull your version @pscicluna and check the unit tests.

scottprahl commented 3 years ago

@jbecca If you could do that, it would be great. Ideally, you would check the Jupyter notebooks too. Basically just loading each notebook and running all cells. If everything still works then let me know and I will make a new release.

Confirmation of the speed increase would be nice too.

jbecca commented 3 years ago

Tests 10 and 11 currently do not pass and there are a lot of numba warnings. I'll work on getting 10 and 11 to pass before addressing the warnings.

Should I also uncomment and test the low level functions?

scottprahl commented 3 years ago

@jbecca Nice progress! In my experience fixing the warnings is often the best way to get failing tests to pass. Ultimately I would like to merge a numba update that throws no warnings, but how we get there is up to you.

I do not think you need to test the low-level functions.

jbecca commented 3 years ago

I have all of the test cases passing, but there are use cases in the jupyter notebooks that do not run. The issues arise when calling in an array of sizes. Not a single unit test currently calls in an array of sizes, while the code is supposed to handle x and m of arbitrary length.

scottprahl commented 3 years ago

@jbecca Did you need to make many changes?

Missing unit tests are a reflection of how the code was ported from C. I was primarily worried in the C-code that I was getting the right answer.

Then when developing the miepython code, I realized it would be super convenient to be able to pass both scalars and arrays. So I implemented this feature and, evidently, only tested it in the Jupyter notebooks.

If you could put together a few unit tests for testing basic functionality of miepython with arrays of x and/or m that would be much appreciated.

scottprahl commented 3 years ago

@pscicluna @jbecca I have created a new branch numba. It jits cleanly and without warnings. The code passes all tests.

Unfortunately it is 5X slower than regular python. Just type make jittest and the elapsed time for the unit tests is 2 seconds for non-jitted and 10 seconds for jitted code on my machine.

jbecca commented 3 years ago

I have a branch that has a 5 times speed up, but I still have a few warnings with the scattering functions. I could remove the jit from the scattering code and just push the part that works for now.

scottprahl commented 3 years ago

@jbecca That is great! Please rebase your changes against miepython_jit.py in my numba branch.

After messing around with function signatures, I got rid of all of them, because they slowed things down by about 5%.

To keep things simple, I only used @njit on the hot code paths.

I slightly restructured the code so numba would be able to jit without warnings or errors.

The jitted code is in miepython_jit.py to make performance testing easier. (If we get a 5X speedup, then I will reverse the names so that the original code will be in miepython_unjit.py.)

scottprahl commented 3 years ago

@jbecca @pscicluna This morning I spend some time creating a performance notebook to test jitted speeds. I used the %timeit function. The result is that the numba version is 10 to 700X faster. See

https://github.com/scottprahl/miepython/blob/numba/docs/11_performance.ipynb

Please pull the numba branch and see if you get similar results.

jbecca commented 3 years ago

I get a roughly 100x speedup running test_jit.py versus test.py. I also had the same findings as you where numba was being called for routines like mie and ez_mie that did not speed anything up and only led to warnings. Your numba branch did what I was working on in a much cleaner fashion. I did not realize that @njit declarations took datatype arguments to declare what types should be returned. This is far clearer and easier to maintain than the code that I had written. Good work!

jbecca commented 3 years ago

Just ran your notebook on my 2019 MBP (macOS Big Sur 11.2.3, 2.3GHz 8-core i9, 16gb RAM in a clean anaconda python=3.8.8 environment) and here are the results for the notebook speed test:

Size parameters: 9 to 87x speed up Embedded spheres: 31 to 73x speed up ez Mie: 23 to 72x speed up scattering phase function: 53 to 675x speed up (holy shit) as function of sphere size: 458 to 710 (!!!!)

scottprahl commented 3 years ago

thanks everyone! released v2.0.0

scottprahl / miepython

Compiling miepython using numba #10