Fast Sherman Morrison for Kernel Ecorr

vhaasteren commented 1 year ago

This pull request allows Enterprise to use the package fastshermanmorrison-pulsar in EcorrKernelNoise, which contains a drop-in replacement for ShermanMorrison called FastShermanMorrison. It does the exact same thing, but now with Cython optimized code. Behind the scenes it just calls the Scipy BLAS functions, without the overhead of Python slicing operations.

To do:

[x] Provide some mechanism to only ONCE show a warning that fastshermanmorrison-pulsar should be installed
[x] Make fastshermanmorrison-pulsar install when running the tests in github
[x] Make sure that the tests for fast-sherman-morrison are run
[x] Package fastshermanmorrison-pulsar for Pypi
[x] Allow for index arrays to accomodate a solution for #319. This requires updates to fastshermanmorrison-pulsar as well
[x] Verify that the code works on real data

Packaging fastshermanmorrison-pulsar for conda seems to be impractical due to the necessity for binaries.

This package is an adjusted version of the "Jitter Extension" that I wrote for older PTA packages. The main reason I think this was not used in Enterprise is that the most-important component, the _solve_2D method, was not optimized at all. This version contains some new C code that optimizes it. A standalone package with this code is in my repository, and you can install it with:

pip install fastshermanmorrison-pulsar

The speed increase depends on the specific pulsar datasets. Rough idea: the _solve_1D is 20x faster, and the _solve_2D is 2x faster. So this significantly speeds up the calculation of all (T^T N^{-1} T) when kernel ecorr is used. Especially the Gibbs sampler will be affected by this upgrade, as that uses the _solve_1D functions.

The NANOGrav single pulsar noise analysis will get a speedup of roughly 2x

AaronDJohnson commented 1 year ago

Hey @vhaasteren, I think that adding C code will complicate the installation process especially for conda. Pure Python makes the installation extremely easy. Is there any chance that we can speed this up with a package like numba instead?

vhaasteren commented 1 year ago

@AaronDJohnson, is Numba really easier to install than Cython? I am not experienced with Numba, but as far as I have been able to read, it's actually harder to install than Cython. I have never encountered issues installing Cython. Note that libstempo is also Cython, and the difficulty there comes from the fact that it uses Tempo2

The external C code is in there (which is just a Cython extension), because the Kernel Ecorr stuff uses slicing operations that don't work well with BLAS if you mix row-major and column-major arrays. The speedup I have achieved is just that: get rid of slicing operations and memory management overhead.

No idea whether this is possible to do efficiently in Numba. In pure Cython (without C code extension) it was impossible due to Scipy BLAS memory management. And I did not want to use anything but the Scipy interface, because THEN you have to link against BLAS yourself and complicate the installation.

BTW: the tests all seem to fail right now because I am including numpy in setup.py, as far as I can judge.

AaronDJohnson commented 1 year ago

@vhaasteren numba is extremely easy to install via either pip or conda in the currently supported Python versions. I think the article that you posted is from 2015....

The benefit of numba would really only be that C code is not packaged in enterprise. If you were to compile this as a standalone package and then we import it and use it here, enterprise could retain its status as a "pure Python" package which makes everything very portable in the sense that we can use a "no-arch" tag in the conda installation, and it will cover all system architectures automatically.

I should mention that I am not a maintainer on the installation methods here.... The change is really up to the maintainers.

vhaasteren commented 1 year ago

Ah, @AaronDJohnson, I see. So even just the presence of a C file gives this kind of complications? I can look into making a separate package on Pypy/conda that provides FastShermanMorrison. That is not much work, besides the packaging overhead.

It would probably be good to modify EcorrKernelNoise to be able to use that external package. It can always fall back to ShermanMorrison if the package cannot be found. I could modify this PR to do that...

AaronDJohnson commented 1 year ago

If there is any compilation, it will be system architecture specific and use different compilers on different system architectures in conda (clang vs. gfortran).

vhaasteren commented 1 year ago

Well, we can talk about moving it back into Enterprise in the future. I had thought that architecture stuff wouldn't be an issue, because it's just the setup.py script that deals with Cython. Anyway, I had actually first already packaged it in it's own repository last, so I just linked to it again from the Enterprise code.

It seems to mostly work, although I also saw that there is an issue when combining it with MarginalizingTimingModel, so I'll look into that.

codecov[bot] commented 1 year ago

Codecov Report

Merging #343 (7f6795a) into dev (c49646c) will decrease coverage by 0.03%. Report is 5 commits behind head on dev. The diff coverage is 100.00%.

Additional details and impacted files

[![Impacted file tree graph](https://app.codecov.io/gh/nanograv/enterprise/pull/343/graphs/tree.svg?width=650&height=150&src=pr&token=7Sjk8cLA85&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=nanograv)](https://app.codecov.io/gh/nanograv/enterprise/pull/343?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=nanograv) ```diff @@ Coverage Diff @@ ## dev #343 +/- ## ========================================== - Coverage 88.37% 88.35% -0.03% ========================================== Files 13 13 Lines 3012 3023 +11 ========================================== + Hits 2662 2671 +9 - Misses 350 352 +2 ``` | [Files](https://app.codecov.io/gh/nanograv/enterprise/pull/343?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=nanograv) | Coverage Δ | | |---|---|---| | [enterprise/signals/white\_signals.py](https://app.codecov.io/gh/nanograv/enterprise/pull/343?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=nanograv#diff-ZW50ZXJwcmlzZS9zaWduYWxzL3doaXRlX3NpZ25hbHMucHk=) | `98.59% <100.00%> (+0.11%)` | :arrow_up: | ... and [1 file with indirect coverage changes](https://app.codecov.io/gh/nanograv/enterprise/pull/343/indirect-changes?src=pr&el=tree-more&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=nanograv) ------ [Continue to review full report in Codecov by Sentry](https://app.codecov.io/gh/nanograv/enterprise/pull/343?src=pr&el=continue&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=nanograv). > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=nanograv) > `Δ = absolute (impact)`, `ø = not affected`, `? = missing data` > Powered by [Codecov](https://app.codecov.io/gh/nanograv/enterprise/pull/343?src=pr&el=footer&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=nanograv). Last update [c49646c...7f6795a](https://app.codecov.io/gh/nanograv/enterprise/pull/343?src=pr&el=lastupdated&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=nanograv). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=nanograv).

AaronDJohnson commented 1 year ago

Have you tried putting the MarginalizingTimingModel first in the model? This fixes the division issue on the usual ShermanMorrison calculation. No guarantees that it's the same issue, but maybe it's worth a check?

vhaasteren commented 1 year ago

Oh, right, MTM also needs to be before the Ecorr in the model build. I remember that one. That's a small little bug actually.

Anyway, then it really works as a drop-in replacement. I'll write the tests this week. Perhaps I'm a bit too aggressive here to set it as the default, but I didn't see why not if it falls back anyway.

Getting fastshermanmorrison into Pypy and Conda may take me a bit longer.

AaronDJohnson commented 1 year ago

@vhaasteren you might try grayskull for the conda recipe. It has worked really well for me recently. There may be something similar for PyPI....

vhaasteren commented 1 year ago

@AaronDJohnson, I looked at the packaging today, and I agree that it's better to keep this in a separate package. Just getting it into Pypi requires a lot of work. You have to make it work for manylinux standards so that it works on different linux distros.

Using docker, I am sure I can make that work. But let's keep it out of Enterprise. This FastShermanMorrison code will be an optional "Enterprise extension". My suggestion would be to still make it the default whenever you add an EcorrKernelNoise, but once you add such a signal a check will be carried out whether it managed to import the library (like it does now in this PR~). If fastshermanmorrison is not available, it should give a suggestion to install the package using conda/pip (only once, not for every pulsar or something) and fall back to ShermanMorrison.

Does that sound more reasonable?

My fastshermanmorrison-pulsar package is ready for Pypi btw. I just need to compile it on the right host system(s) with docker.

AaronDJohnson commented 1 year ago

I think this sounds good. Maybe @vallis would like to comment on it?

vhaasteren commented 1 year ago

@AaronDJohnson, @vallis, I'll need some help with the tests. I now have a package called fastshermanmorrison-pulsar on Pypi, and it can be installed with pip install fastshermanmorrison-pulsar.

The idea is that Enterprise should not require the package, but it would make sense to make fast-sherman-morrison the default for EcorrKernelNoise, and then fall back to sherman-morrison if the package is not available. It's like that in the code now, I think that is fine.

However, I think that the tests here on github should check the output of fastshermanmorrison. After all, the tests here are just for the devs, and the user experience is not affected. So, I'd like the package fastshermanmorrison-pulsar to be available when running the test suite. I edited the Makefile to include requirements_tests.txt, but that doesn't seem to do install anything on the github tests: when I check the installs under 'details' above, I can't find fastshermanmorrison being installed. And the codecov/patch seems to think that the added code in EcorrKernelNoise is not being tested.

Am I taking the right approach, and if so, what do I change?

vhaasteren commented 1 year ago

I filed an issue under #344, because I think that issue would allow the natural way to show a warning message to the user only once (telling the user to install fastshermanmorrison with pip). Edit: PR #345 will allow such mechanism.

vhaasteren commented 11 months ago

The the readthedocs and the python errors need to be fixed in master first with PR #352 and by solving issue #357

This PR is ready to be merged

nanograv / enterprise

Fast Sherman Morrison for Kernel Ecorr #343

Codecov Report