Thick/thin-target bremsstrahlung integrals are slow

elastufka commented 2 years ago

Describe the performance issue

Making the functions _emission.integratepart and _emission.split_andintegrate more pythonic only improve readability and not performance.

To Reproduce

from sunxspex import emission

p,q, eebrk, eelow, eehigh=4.0,6.0,150.0,20.0,3200.0
photon_energies=np.linspace(4,5000,8000)
z=1.2
maxfcn=2048
rerr=1e-4

%%timeit
thick_new,_=emission.split_and_integrate(model='thick-target', photon_energies=photon_energies, maxfcn=maxfcn, rerr=rerr,eelow=eelow, eebrk=eebrk, eehigh=eehigh,p=p, q=q, z=z, efd=False)

%%timeit
thick_original,_=emission.split_and_integrate0(model='thick-target', photon_energies=photon_energies, maxfcn=maxfcn, rerr=rerr,eelow=eelow, eebrk=eebrk, eehigh=eehigh,p=p, q=q, z=z, efd=False)

Proposed fix

I'll order these roughly by the amount of time (increasing) it would take [me] to try each approach. For reference, the integral is here and the equation for the bremsstrahlung cross-section is here.

Do nothing. The only people who care about speed are those planning on automatically fitting thousands of spectra
- of course, it would be nice to have at least some improvement to tempt more users away from OPSEX
- also, this results in a slower fit
Talk to an expert, or the people who wrote the IDL versions. Please don't all volunteer at once ;)
Use multiprocessing to simultaneously calculate all orders of the integral from npoints=4 to npoints=2*12, then use matrix subtraction to calculate the error and return the appropriate solution
- waste of resources if integration satisfies relative error condition after only a few iterations
Figure out how to get Gauss-Kronrod quadrature to work (implemented by quadpy.c1.adaptive)
- problem is that the intervals of the definite integral get split in half, and then there is a dimension mismatch between the _electronenergy and _photonenergy input to _bremsstrahlung_crosssection
  - side note: Docstring is super confusing. There's no reason at all (in the code) for the photon energies to 'correspond to' the electron energies, and writing it that way makes me think one can be used to calculate the other
- splitting photon energy bins in half to match the dimensions sometimes results in NaN when _electronenergy is lower than _photonenergy
  - also I have no idea if that's violating any physical laws
- without the brem_cross term adaptive quadrature works great
  - the original integral is also much faster without it, but it's kind of very important unfortunately
Enable GPU support so that those who care about speed can have it if they have the hardware
- PyTorch implementation might not be too difficult
- again, kind of a waste of resources, and would introduce another dependency that is quite a large package
Implement with Cython
- Haulin could write the C part, and there is the C library cubature which is also designed for solving Gauss-Legendre quadrature. But I think only the intgrand would need to be in C (and therefore the collisional energy, electron density, and bremsstrahlung cross-section functions)
- cons: pointers
Transform the limits of the integral to [-1,1] so that pre-computed Gauss-Legendre points and weights can be used to do the integration
- This might not take a lot of time for someone who enjoys math
- However, this requires dealing with the bremsstrahlung cross-section, a complicated polynomial dependent on the electron energy, photon energy, and cross-terms thereof. So such an approach might not be possible
- maybe someone who remembers how to use Mathematica can check
Learn more math. Probably numerical integration didn't peak with Gauss-Legendre and there are newer, better methods better suited to this problem.

elastufka commented 2 years ago

Short update:

Gauss-Kronrod integration works, but doesn't produce the correct answer (reasons unknown) and converges slower than Gauss-Legendre anyway.
Started on a PyTorch implementation via torchquad. According to the README, performance with a GPU exceeds CPU performance when the number of function evaluations is > 10^5. The maximum allowed number of evaluations before the thick target integral gives up is ~10^4. For the tests, it converges before ~10^3 evaluations. For spectral fitting, 100 iterations of the thick-target integral means a total number of function evaluations between 10^5 and 10^7. So depending on how the fitting is executed, this implementation could increase the performance significantly.
Gauss-Legendre quadrature routines already exist in C++ (ROOT) for example

samaloney commented 2 years ago

Great I think the main thing from my side is to not tie the package down to anyone of these instead allow them all through an API. Also adding new dependencies needs careful consideration in terms of future support etc.

So I'd suggest focusing on refactoring the current code into a function with similar api to scipy e.g. GuassLegendre(func, **kwargs) before starting to implement new methods.

ianan commented 2 years ago

Is the speed up here being tested just in calculating the model or also as part of the fitting?

As some of these approaches (multi-threaded and maybe PyTorch?) might speed up the integrals in isolation but when they are included as part of the fitting it might not actually help as other parts of the code are also using them (i.e. multi-thread already done by default in numpy matrix multiply with the srm? or the mcmc runs?).

elastufka commented 2 years ago

Right now just in calculating the model.

The slowest part of the model evaluation is the bremsstrahlung cross-section calculation:

1601 function calls (1573 primitive calls) in 0.033 seconds

   Ordered by: standard name

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
       38    0.000    0.000    0.000    0.000 constants.py:56(get_constant)
        6    0.003    0.000    0.004    0.001 emission.py:128(density)
        6    0.001    0.000    0.001    0.000 emission.py:166(collisional_loss)
        6    0.012    0.002    0.012    0.002 emission.py:197(bremsstrahlung_cross_section)
        6    0.003    0.000    0.032    0.005 emission.py:286(gauss_legendre_idl)
        6    0.004    0.001    0.005    0.001 emission.py:379(points_and_weights_idl)
        2    0.001    0.000    0.032    0.016 emission.py:438(integrate_part)
        6    0.005    0.001    0.023    0.004 emission.py:506(model_func)
        1    0.000    0.000    0.033    0.033 emission.py:548(split_and_integrate)
        6    0.000    0.000    0.000    0.000 emission.py:69(__init__)

Torch makes things slower due to overhead when initializing tensors. It could be useful for fitting with fixed-point quadrature rather than increasing the number of points until a relative error condition is met, because it wouldn't duplicate calculations. But otherwise it's way too slow even on a GPU unless you have 10^5 photon energies or so.

I might check on the accuracy of using a different (non-iterative) way of doing the integral, such as Simpson's rule.

ianan commented 2 years ago

Fair enough - sadly no magic solution. Probably needs the C to do a fast and accurate integral.

sunpy / sunkit-spex