Open elastufka opened 2 years ago
Short update:
Great I think the main thing from my side is to not tie the package down to anyone of these instead allow them all through an API. Also adding new dependencies needs careful consideration in terms of future support etc.
So I'd suggest focusing on refactoring the current code into a function with similar api to scipy e.g. GuassLegendre(func, **kwargs)
before starting to implement new methods.
Is the speed up here being tested just in calculating the model or also as part of the fitting?
As some of these approaches (multi-threaded and maybe PyTorch?) might speed up the integrals in isolation but when they are included as part of the fitting it might not actually help as other parts of the code are also using them (i.e. multi-thread already done by default in numpy matrix multiply with the srm? or the mcmc runs?).
Right now just in calculating the model.
The slowest part of the model evaluation is the bremsstrahlung cross-section calculation:
1601 function calls (1573 primitive calls) in 0.033 seconds
Ordered by: standard name
ncalls tottime percall cumtime percall filename:lineno(function)
38 0.000 0.000 0.000 0.000 constants.py:56(get_constant)
6 0.003 0.000 0.004 0.001 emission.py:128(density)
6 0.001 0.000 0.001 0.000 emission.py:166(collisional_loss)
6 0.012 0.002 0.012 0.002 emission.py:197(bremsstrahlung_cross_section)
6 0.003 0.000 0.032 0.005 emission.py:286(gauss_legendre_idl)
6 0.004 0.001 0.005 0.001 emission.py:379(points_and_weights_idl)
2 0.001 0.000 0.032 0.016 emission.py:438(integrate_part)
6 0.005 0.001 0.023 0.004 emission.py:506(model_func)
1 0.000 0.000 0.033 0.033 emission.py:548(split_and_integrate)
6 0.000 0.000 0.000 0.000 emission.py:69(__init__)
Torch makes things slower due to overhead when initializing tensors. It could be useful for fitting with fixed-point quadrature rather than increasing the number of points until a relative error condition is met, because it wouldn't duplicate calculations. But otherwise it's way too slow even on a GPU unless you have 10^5 photon energies or so.
I might check on the accuracy of using a different (non-iterative) way of doing the integral, such as Simpson's rule.
Fair enough - sadly no magic solution. Probably needs the C to do a fast and accurate integral.
Describe the performance issue
Making the functions _emission.integratepart and _emission.split_andintegrate more pythonic only improve readability and not performance.
To Reproduce
Proposed fix
I'll order these roughly by the amount of time (increasing) it would take [me] to try each approach. For reference, the integral is here and the equation for the bremsstrahlung cross-section is here.
Do nothing. The only people who care about speed are those planning on automatically fitting thousands of spectra
Talk to an expert, or the people who wrote the IDL versions. Please don't all volunteer at once ;)
Use multiprocessing to simultaneously calculate all orders of the integral from npoints=4 to npoints=2*12, then use matrix subtraction to calculate the error and return the appropriate solution
Figure out how to get Gauss-Kronrod quadrature to work (implemented by quadpy.c1.adaptive)
Enable GPU support so that those who care about speed can have it if they have the hardware
Implement with Cython
Transform the limits of the integral to [-1,1] so that pre-computed Gauss-Legendre points and weights can be used to do the integration
Learn more math. Probably numerical integration didn't peak with Gauss-Legendre and there are newer, better methods better suited to this problem.