dstansby commented 1 year ago

xref https://github.com/wtbarnes/fiasco/issues/26. I realised that it should be possible to parallelise the computation on many different matrices. I'm not sure this is the right approach, as in my experience doing a naïve multiprocessing implementation isn't rarely the best way of doing parallel stuff, but opening for discussion.

With the following code:

import astropy.units as u
import numpy as np

from fiasco import Ion

if __name__ == '__main__':
    Te = np.geomspace(0.1, 100, 41) * u.MK
    ne = 1e8 * u.cm**-3

    ion = Ion('Fe XII', Te)
    print(ion)

    contribution_func = ion.contribution_function(ne)

parallelising the eigen{vector, value} calculation speeds it up from ~26secs to 14secs in total for me.

26.122 <module>  cont_func.py:1
├─ 23.632 func_wrapper  fiasco/util/decorators.py:22
│  └─ 23.631 wrapper  astropy/units/decorators.py:226
│        [8 frames hidden]  astropy, <built-in>
│           23.628 Ion.contribution_function  fiasco/ions.py:473
│           └─ 23.401 func_wrapper  fiasco/util/decorators.py:22
│              └─ 23.392 wrapper  astropy/units/decorators.py:226
│                    [2 frames hidden]  astropy
│                       23.381 Ion.level_populations  fiasco/ions.py:376
│                       ├─ 19.845 eig  <__array_function__ internals>:177

to

14.312 <module>  cont_func.py:1
├─ 11.843 func_wrapper  fiasco/util/decorators.py:22
│  └─ 11.842 wrapper  astropy/units/decorators.py:226
│        [9 frames hidden]  astropy, <built-in>
│           11.839 Ion.contribution_function  fiasco/ions.py:482
│           ├─ 11.604 func_wrapper  fiasco/util/decorators.py:22
│           │  └─ 11.593 wrapper  astropy/units/decorators.py:226
│           │        [2 frames hidden]  astropy
│           │           11.580 Ion.level_populations  fiasco/ions.py:378
│           │           ├─ 7.968 Pool.map  multiprocessing/pool.py:362

codecov-commenter commented 1 year ago

Codecov Report

Base: 86.82% // Head: 86.88% // Increases project coverage by +0.05% :tada:

Coverage data is based on head (82d2d6a) compared to base (78c672f). Patch coverage: 100.00% of modified lines in pull request are covered.

Additional details and impacted files

```diff @@ Coverage Diff @@ ## main #199 +/- ## ========================================== + Coverage 86.82% 86.88% +0.05% ========================================== Files 22 22 Lines 1829 1837 +8 ========================================== + Hits 1588 1596 +8 Misses 241 241 ``` | [Impacted Files](https://codecov.io/gh/wtbarnes/fiasco/pull/199?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=Will+Barnes) | Coverage Δ | | |---|---|---| | [fiasco/ions.py](https://codecov.io/gh/wtbarnes/fiasco/pull/199/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=Will+Barnes#diff-Zmlhc2NvL2lvbnMucHk=) | `90.14% <100.00%> (+0.19%)` | :arrow_up: | Help us with your feedback. Take ten seconds to tell us [how you rate us](https://about.codecov.io/nps?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=Will+Barnes). Have a feature suggestion? [Share it here.](https://app.codecov.io/gh/feedback/?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=Will+Barnes)

:umbrella: View full report at Codecov.
:loudspeaker: Do you have feedback about the report comment? Let us know in this issue.

dstansby commented 1 year ago

Here's a quick plot of the speedups I get with this PR:

fiasco

wtbarnes commented 1 year ago

This is encouraging to see!

I think it would be nice to stick the matrix inversion logic into its own function (maybe not even as a method on Ion) to help keep the level_populations method clean since this method is already becoming quite complex. This would would also make future experiments with performance much easier.

dstansby commented 1 year ago

Good shout - I can do that in another PR.

Worth noting that in the current single (Python) thread case, numpy (at least on my computer) still uses a multi-threaded implementation for eig, but for some reason it's slower than running lots of single threaded eigs in parallel.

wtbarnes commented 1 year ago

Ah ok I was just about to ask whether your case in your plot labeled "single-threaded" was actually single threaded (ie OMP_NUM_THREADS=1 was actually being enforced) or whether numpy was doing some parallelization. Interesting that it seems to be slower.

Thinking more about your plot, I am surprised that the execution time scales so strongly with number of temperature points and that multiprocessing does so much better. I had (maybe naively) assumed that it would be hard to beat the vectorization over temperature provided by numpy.

dstansby commented 1 year ago

My guess is that numpy doesn't parallelize across the temperature index, but that when computing the eigenvalues/vectors on a single square matrix it dispatches to a math library that does use multiple threads. Maybe the math library I'm using isn't well optimized for my CPU?

If anyone else wants to do some testing here's the code I used:

from datetime import datetime

import astropy.units as u
import numpy as np

from fiasco import Ion

if __name__ == '__main__':
    ns = 2**np.arange(1, 7)
    times = {}

    for n in ns:
        print(n)
        Te = np.geomspace(0.1, 100, n) * u.MK
        ne = 1e8 * u.cm**-3

        ion = Ion('Fe XII', Te)

        t = datetime.now()
        contribution_func = ion.contribution_function(ne)
        times[n] = (datetime.now() - t).total_seconds()

    print(times)

wtbarnes / fiasco

Parallelise eigen{vec,val} calculations #199

Codecov Report