Open dstansby opened 1 year ago
Base: 86.82% // Head: 86.88% // Increases project coverage by +0.05%
:tada:
Coverage data is based on head (
82d2d6a
) compared to base (78c672f
). Patch coverage: 100.00% of modified lines in pull request are covered.
:umbrella: View full report at Codecov.
:loudspeaker: Do you have feedback about the report comment? Let us know in this issue.
Here's a quick plot of the speedups I get with this PR:
This is encouraging to see!
I think it would be nice to stick the matrix inversion logic into its own function (maybe not even as a method on Ion) to help keep the level_populations method clean since this method is already becoming quite complex. This would would also make future experiments with performance much easier.
Good shout - I can do that in another PR.
Worth noting that in the current single (Python) thread case, numpy
(at least on my computer) still uses a multi-threaded implementation for eig
, but for some reason it's slower than running lots of single threaded eig
s in parallel.
Ah ok I was just about to ask whether your case in your plot labeled "single-threaded" was actually single threaded (ie OMP_NUM_THREADS=1 was actually being enforced) or whether numpy was doing some parallelization. Interesting that it seems to be slower.
Thinking more about your plot, I am surprised that the execution time scales so strongly with number of temperature points and that multiprocessing does so much better. I had (maybe naively) assumed that it would be hard to beat the vectorization over temperature provided by numpy.
My guess is that numpy
doesn't parallelize across the temperature index, but that when computing the eigenvalues/vectors on a single square matrix it dispatches to a math library that does use multiple threads. Maybe the math library I'm using isn't well optimized for my CPU?
If anyone else wants to do some testing here's the code I used:
from datetime import datetime
import astropy.units as u
import numpy as np
from fiasco import Ion
if __name__ == '__main__':
ns = 2**np.arange(1, 7)
times = {}
for n in ns:
print(n)
Te = np.geomspace(0.1, 100, n) * u.MK
ne = 1e8 * u.cm**-3
ion = Ion('Fe XII', Te)
t = datetime.now()
contribution_func = ion.contribution_function(ne)
times[n] = (datetime.now() - t).total_seconds()
print(times)
xref https://github.com/wtbarnes/fiasco/issues/26. I realised that it should be possible to parallelise the computation on many different matrices. I'm not sure this is the right approach, as in my experience doing a naïve
multiprocessing
implementation isn't rarely the best way of doing parallel stuff, but opening for discussion.With the following code:
parallelising the eigen{vector, value} calculation speeds it up from ~26secs to 14secs in total for me.
to