AaronDJohnson commented 3 years ago

Contains faster likelihood calculation as presented by Xavi at the Spring NANOGrav meeting.

This faster likelihood calculation:

Is 2 - 6 times faster in all gravitational wave background searches.
Leaves the wideband likelihood calculation the same
Is a little slower in the continuous wave searches
Matches to good accuracy with current likelihood calculation in all tested cases

This pull request adds new class methods to make this new class work, renames the current likelihood calculation class to OldLogLikelihood, and sets the LogLikelihood class containing the new calculation to be the default method of computation. Additionally, a CompareLogLikelihood class is added to allow quick comparison of speed and accuracy between the two methods of likelihood calculation.

stevertaylor commented 3 years ago

This is a fundamental change to the likelihood implementation in enterprise. I think we need to have a broader discussion about incorporating this as the default behavior. I would prefer to retain the current behavior as the default for now, and have the new method be an optional switch, especially since the new method does not have documentation or a reference to explain its implementation (unless it really is equivalent to the old G-matrix approach).

kdolum commented 3 years ago

The mathematics of the new method are described in Xavi's talk at https://files.slack.com/files-pri/T02HY66UF-F022X0AQ7U4/download/nanogravspring2021likelihoodpresentation.pdf . The implementation of the new method is described in detail in a long comment in the code. I would be happy to merge these things into some other document if you would like to suggest a location.

This code has been extensively tested. For GWB, I did 10,000 samples comparing the old and new computations with the change in the log likelihood never being higher than 10^-3, i.e., the likelihoods are within 0.1%. Aaron has tested a broad range of other analyses with similar good results. The case with the largest discrepancy is continuous wave (for which this new method is not helpful anyway). The largest discrepancies in this case are differences of a few in log likelihood, but they occur only in cases where the log likelihood is less by at least 100,000 or so than the typical values, thus parameter sets that are of no interest. Presumably that problem results from numerical instability, so we don't know which method is giving the better answer.

As for the default, I'm afraid that without changing it there will be people who do runs lasting several weeks, never knowing that a one line change would have saved them 60% or even 80% of their runtime.

stevertaylor commented 3 years ago

In my view it would not be appropriate to completely change the default behavior of our pipeline on the eve of an important flagship analysis. The method has not featured in a published article, and has not been used in previous published PTA real-dataset analyses of any kind. I don't doubt the mathematics or the method at all-- it's the process that concerns me, and how we will be presenting our pipeline and methods to the scientific community. At the very least a technical memo could be produced that details the math and the various tests performed, along with scenarios where it's not helpful. NANOGrav has the capacity to host such collaboration memos. It's clear that this method is useful in speeding up spatially-correlated analyses by a factor of a few. So it's completely appropriate to have this as an optional switch from the current default behavior. Those that will be running spatial-correlation analyses for the forthcoming dataset will certainly be experienced and in-the-know enough to be aware of this new optional "speed switch", so I don't foresee any of us needlessly wasting cluster time. Let's have a discussion about this on Monday's DWG call, and we should certainly get views from enterprise architects like @vallis during the discussions.

scottransom commented 3 years ago

As someone who watches this development from the sidelines, I definitely agree with Steve that this should not be made the default calculation right now. Optional, yes, and it should even possibly be used for much of our analyses. But we will need to convince all of ourselves and others that it is numerically equivalent in the long-run (although the tests that you have described certainly sound promising).

vallis commented 3 years ago

Thanks Ken and Aaron, this is very careful work and the performance is indeed impressive. Steve and Scott have commented already on whether this should be the new default, and I largely agree with them. This does not mean that your code cannot be used already for an important fraction of DWG runs!

Instead I'd like to address the structure of the computation itself, which creates a large number of new methods within SignalCollection, a new classification of Signals, a separate treatment of the "chi" matrix which largely duplicates the existing "phi", and I must be forgetting something else. I'm concerned that all these will weaken the modularity and extensibility of the core library.

Looking at your code, I think there is a way to leverage the existing infrastructure and hide the majority of new tensor objects inside a new Signal object, without burdening the standard classes. If you give me a few days I will sketch it out, and then we may be able to refactor your code quickly.

Three more things:

It's probably not a good idea for maintainability, etc. to set up another caching mechanism instead of extending the one we have. Again, we can collaborate on this.
Surely the CompareLikelihood code can go in the tests script, or at least in utils.py.
I was intrigued by the reuse of Cholesky factorization templates. Do you know how much you gain with it? I'd like to include it in the standard likelihood also. Maybe even this can be done with the standard caching.

I really don't mean to be negative, and I look forward to including your code in a way that gets us awesome performance and empowers future users and developers to build on Enterprise.

codecov[bot] commented 3 years ago

Codecov Report

Merging #286 (679ed33) into master (dfb6d88) will decrease coverage by 0.14%. The diff coverage is 86.19%.

@@            Coverage Diff             @@
##           master     #286      +/-   ##
==========================================
- Coverage   86.44%   86.29%   -0.15%     
==========================================
  Files          12       12              
  Lines        2730     3138     +408     
==========================================
+ Hits         2360     2708     +348     
- Misses        370      430      +60

Impacted Files	Coverage Δ
enterprise/signals/signal_base.py	`87.93% <86.16%> (-1.48%)`	:arrow_down:
enterprise/signals/gp_signals.py	`86.78% <100.00%> (+0.03%)`	:arrow_up:

Continue to review full report at Codecov.

Legend - Click here to learn more Δ = absolute <relative> (impact), ø = not affected, ? = missing data Powered by Codecov. Last update dfb6d88...679ed33. Read the comment docs.

kdolum commented 3 years ago

Michele, the reason there's so much duplicated code in what I wrote is that I was trying to keep the existing code intact, so people can compare the old and new results in their applications. In the long run, I think we should get rid of the old LogLikelihood class and the signal collections methods that only it uses, and have an option to the new class to decide which procedure to use. (Perhaps we could call it "separate_timing_models".) The old algorithm is all there in the new code. If you don't treat any signals as timing model signals, you will get the old computation.

But if you have an idea for a cleaner organization, by all means let us talk about it.

The question of what people are going to use is much more important than the question of defaults. Our hope is that people will use the new code for analyses where it is useful. If people have concerns about reliability, we are happy to do further testing to assuage them.

I agree with you about caching mechanisms. If you know how to use the existing caching mechanism for caching FDr and FDF, I would be delighted.

I don't have any particular opinion about which file should have CompareLogLikelihood, but I do think that it should be kept very straightforward to use it instead of one of the other classes, so that people are encouraged to do this comparison on a wide range of test cases.

I have never found a case where the new code benefits from sparse Cholesky decomposition. In the old way of doing things, I think you might get something like 20% improvement, but I have not studied it. It probably depends on what processer you're using and how much optimization is available there.

But I don't want to change the old code because I don't want any uncertainty about whether comparisons are really being made to the old result. If we follow my suggestion above to use the new code to perform the old computation, this will happen automatically when you set cholesky_sparse.

Ken

vallis commented 3 years ago

Subsumed by #288, which repackaged this algebra and code with a different class structure.

nanograv / enterprise

Fastloglikelihood #286

Codecov Report