moble / spherical_functions

Python/numba package for evaluating Wigner D matrices and spin-weighted spherical harmonics
MIT License
52 stars 11 forks source link

License information for code taken from sympy #20

Closed chatcannon closed 4 years ago

chatcannon commented 4 years ago

The spherical_functions library includes code taken from sympy (e.g. the Wigner3j function). The sympy license (3-clause BSD) is permissive, but still includes some requirements. In particular:

a. Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer.

https://github.com/sympy/sympy/blob/master/LICENSE

Please add the correct license information for the sympy-derived code.

chatcannon commented 4 years ago

P.S. I am not affiliated with sympy. Rather, the reason I want the correct license information is because I am interested in reusing your Wigner 3j code and want to ensure that I comply with all license requirements myself.

moble commented 4 years ago

Just on the basis of good numerics, I should never have included that code; I was mostly using it for quick-and-dirty tests. For example, I hadn't realized that it returned inf for a lot of values for j values above 19 (and even smaller in some asymmetric cases).

However, I do actually have a new use for them in another project, where I need good numerical behavior up to higher j values. So I've written new code (from scratch) that does things more carefully, and should be accurate up to much higher j values (probably over 100). The new Wigner3j function itself is actually a bit slower than the old one. However, there's a very useful Wigner3jCalculator that you can use to calculate a bunch of values at a time. So if you can use that, it's significantly faster, while also behaving better numerically. Using it looks like this:

calculator = Wigner3jCalculator(j2_max, j3_max)

# Presumably, the following is inside some loop over j2, j3, m2, m3:
w3j = calculator.calculate(j2, j3, m2, m3)
m1 = - m2 - m3
for j1 in range(max(abs(j2-j3), abs(m1)), j2+j3+1):
    w3j[j1]  # This equals Wigner3j(j1, j2, j3, m1, m2, m3)
chatcannon commented 4 years ago

Thanks for the note. After reading the journal article where Rasch describes the code he wrote for Sympy, I realised that since I only need the values for the special case m3==0 it was better for me to write my own implementation.

phockett commented 4 years ago

As a - probably pointless - addendum to this, I will add that I did a quick and dirty parallelism of the old Wigner 3j Sympy-based version (see https://epsproc.readthedocs.io/en/dev/tests/Low_level_bench_tests_Wigner3j_Feb2020.html), based on the way the other SF functions were parallelised with Numba for input arrays of QNs. I don't know if this is required for the updated function (skimming the code, I'd guess not so much), or interesting generally, but my use-case essentially involves generating a multiple big tables of 3j values from sets of input QNs for a given problem. (This is for photoionization calculations - for example https://epsproc.readthedocs.io/en/dev/demos/ePSproc_BLM_calc_demo_Sept2019.html#Formalism.)

As a side note, I had previously done all of this with my own functions in C and, later, in Matlab, but neither were especially elegant or fast in general (while my newer python code is at least reasonable fast...), although were at least robust. So a big thanks for writing these beautiful python versions, which have also helped with my limited attempts to learn the language and be a better coder, rather than just keep writing poor scientist-hack code!

moble commented 4 years ago

@phockett Interesting. I don't make much use of numba's parallelization, so that's interesting to see.

As noted above, the new approach with Wigner3jCalculator would probably be significantly faster on a single thread for your use case, since you evidently need a range of these values. However, I should point out that this object is presumably not thread-safe. I don't know how clever numba is about this, but you would probably need to manually create separate instances of the object for each thread if you decide to switch over.

Looking at the code again, I think maybe I should have made the workspace external to the object, with just a convenience method to create it, as I did with the WignerHCalculator object. That would make it much easier to use in parallel. I may do that, while keeping backwards compatibility with reasonable default values.

phockett commented 4 years ago

Yes, good points - I haven't really tried "proper" parallel coding in python, so haven't had to worry about thread-safety (and related) issues thus far, but this will likely become an issue for me at some point, especially since I plan to migrate some sections of my code to a class-based framework.

As far as Numba goes, my (limited) understanding is it only really works automagically with low-level arrays & Numpy, and certainly not high-level objects, so alternative parallelization strategies/libraries/methods/models (==more effort) will be needed in these type of cases anyway. Actually, this is one reason I thought about parallelizing the core numerics before moving up to more complex objects (as well as a general learning experience), since the latter will likely involve more work on the coding side (speaking for myself, as a Python novice, at least!), even if they are conceptually neater.

All that said, for the photoionization calculations I mentioned before, there are essentially two types of usage: (1) working from computational results, where low-level speed is not too much of an issue, since things only need to be calculated once for a given problem; (2) fitting parameters from experimental data, where speed is definitely an issue (many function repetitions), but (in most cases) all the angular momentum stuff can be precalculated for the problem at hand - here most of the magic will be in the efficiency of the fitting routine and memory handling of possibly large arrays, rather than low-level numerical function speed/parallelization. I haven't, yet, tackled that latter case in Python, but have previously used some (relatively) carefully vectorized & threaded Matlab + C fitting routines.

As a side-note, I've also been using Xarray as a general tensor library and data class, which is very nice for handling ND datasets, and making tensor computations easy, so might be of interest to you if you haven't come across it already... but I haven't tried any parallelisation with these objects yet. It natively supports Dask on the back-end, and also wrapping Numpy ufuncs (hence Numba).

phockett commented 4 years ago

Also on Numba, there's a nice performance comparison from Charles Jekel:

TL;DR; Python Numba acceleration is really fast. Like faster than Fortan fast.

Charles Jekel Comparison of performance: Python NumPy and Numba, MATLAB, and Fortran.

moble commented 4 years ago

As far as Numba goes, my (limited) understanding is it only really works automagically with low-level arrays & Numpy, and certainly not high-level objects, so alternative parallelization strategies/libraries/methods/models (==more effort) will be needed in these type of cases anyway.

There's also jitclass, which is what Wigner3jCalculator and WignerHCalculator use. Basically, as long as a class doesn't use too much of python's dynamic fanciness, you can tell numba what the class contains. It then compiles the methods in the class, and you can pass instances of such a class into numba functions. So I see no additional obstacle to parallelizing code using these objects beyond the simple fact that simultaneous calls to a function with the same instance would be trying to use the same workspaces — which is not thread safe, and would need a little extra attention no matter what library/method you want to use for parallelization.

phockett commented 4 years ago

Interesting, thanks - I must have missed or forgotten that Numba functionality (my early attempts here got much past the automagic methods!).

I'll be sure to let you know if, as and when I try out the new classes, and any further Numba-based benchmarks (or related). I'm deep in formalism development at the moment, so it might be a while before I circle back to this though.