optimize bandpower window integration

cmbant commented 4 months ago

At least for the moment, the weights are mostly zero since the bandpowers localized, so all those elements of the matrix multiplication are not needed. This refactor speeds it up by about a factor of about 8 on my laptop, hopefully enough to make dragging sampling useful. It could of course be further improved by putting the whole operation into numba with openmp.

Now the remaining computing time is split roughly equally between _get_foreground_model and the rest. To see where time was spent I did

from io import StringIO
import pstats
from cProfile import Profile
prof = Profile()
prof.enable()
nuis_params['a_tSZ']*=1.00001 # make sure no cache on fast params
loglikes, derived = model.loglikes(nuis_params)
prof.disable()
s = StringIO()
ps = pstats.Stats(prof, stream=s)
ps.strip_dirs()
ps.sort_stats('time')
ps.print_stats(30)
ps.sort_stats('cumtime')
ps.print_stats(30)
print(s.getvalue())

codecov-commenter commented 4 months ago

Codecov Report

All modified and coverable lines are covered by tests :white_check_mark:

Please upload report for BASE (master@660e0fa). Learn more about missing BASE report.

Additional details and impacted files

[![Impacted file tree graph](https://app.codecov.io/gh/simonsobs/LAT_MFLike/pull/80/graphs/tree.svg?width=650&height=150&src=pr&token=qrrVcbNCs5&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=simonsobs)](https://app.codecov.io/gh/simonsobs/LAT_MFLike/pull/80?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=simonsobs) ```diff @@ Coverage Diff @@ ## master #80 +/- ## ========================================= Coverage ? 87.59% ========================================= Files ? 3 Lines ? 395 Branches ? 0 ========================================= Hits ? 346 Misses ? 49 Partials ? 0 ``` | [Files](https://app.codecov.io/gh/simonsobs/LAT_MFLike/pull/80?dropdown=coverage&src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=simonsobs) | Coverage Δ | | |---|---|---| | [mflike/mflike.py](https://app.codecov.io/gh/simonsobs/LAT_MFLike/pull/80?src=pr&el=tree&filepath=mflike%2Fmflike.py&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=simonsobs#diff-bWZsaWtlL21mbGlrZS5weQ==) | `90.40% <100.00%> (ø)` | |

xgarrido commented 4 months ago

Really looks good to me. I did the time benchmark as suggested and I'm getting same time factors.

One remark a bit uncorrelated to this PR, I had a look into the _fast_chi_square function of cobaya which, in most of the time, fall down to camb.mathutils.chi_squared (in case camb is installed) but in case camb is not installed the default way to compute $\chi^2$ is https://github.com/CobayaSampler/cobaya/blob/master/cobaya/likelihoods/base_classes/DataSetLikelihood.py#L27 Regarding time benchmark, we noticed sometimes ago that dot product was slightly slower than @ product introduced in python 3.5. So we might save few microseconds in case someone use mflike (or any likelihood relying to _fast_chi_square) without camb installed.

cmbant commented 4 months ago

Interesting, I had assumed they mapped to identical code. If I do this in jupyter (for n=9000) if anything the .dot form seems to be faster for me

timeit v @ cov @ v
%timeit cov.dot(v).dot(v)
%timeit v @ cov @ v
%timeit cov.dot(v).dot(v)

11.4 ms ± 222 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
10.4 ms ± 63.7 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
11.2 ms ± 154 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
10.6 ms ± 196 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

CAMB's gets the extra factor of 2 from symmetry.

xgarrido commented 4 months ago

I also thought they map to the same piece of code. In your test, the cov matrix is very sparse (so the purpose of this PR). If you do the test on the same 9000x9000 matrix with random numbers, then the @ is much faster than dot. I'm not sure I understand the difference between dot and matmul which corresponds to the @ implementation https://numpy.org/doc/stable/reference/generated/numpy.matmul.html#numpy.matmul

cmbant commented 4 months ago

odd, I see the same with random numbers and matmul or @ (Maybe it changed in numpy 2 or depends on threading?)

simonsobs / LAT_MFLike

optimize bandpower window integration #80

Codecov Report