Closed AaronDJohnson closed 3 years ago
This is a fundamental change to the likelihood implementation in enterprise. I think we need to have a broader discussion about incorporating this as the default behavior. I would prefer to retain the current behavior as the default for now, and have the new method be an optional switch, especially since the new method does not have documentation or a reference to explain its implementation (unless it really is equivalent to the old G-matrix approach).
The mathematics of the new method are described in Xavi's talk at https://files.slack.com/files-pri/T02HY66UF-F022X0AQ7U4/download/nanogravspring2021likelihoodpresentation.pdf . The implementation of the new method is described in detail in a long comment in the code. I would be happy to merge these things into some other document if you would like to suggest a location.
This code has been extensively tested. For GWB, I did 10,000 samples comparing the old and new computations with the change in the log likelihood never being higher than 10^-3, i.e., the likelihoods are within 0.1%. Aaron has tested a broad range of other analyses with similar good results. The case with the largest discrepancy is continuous wave (for which this new method is not helpful anyway). The largest discrepancies in this case are differences of a few in log likelihood, but they occur only in cases where the log likelihood is less by at least 100,000 or so than the typical values, thus parameter sets that are of no interest. Presumably that problem results from numerical instability, so we don't know which method is giving the better answer.
As for the default, I'm afraid that without changing it there will be people who do runs lasting several weeks, never knowing that a one line change would have saved them 60% or even 80% of their runtime.
In my view it would not be appropriate to completely change the default behavior of our pipeline on the eve of an important flagship analysis. The method has not featured in a published article, and has not been used in previous published PTA real-dataset analyses of any kind. I don't doubt the mathematics or the method at all-- it's the process that concerns me, and how we will be presenting our pipeline and methods to the scientific community. At the very least a technical memo could be produced that details the math and the various tests performed, along with scenarios where it's not helpful. NANOGrav has the capacity to host such collaboration memos. It's clear that this method is useful in speeding up spatially-correlated analyses by a factor of a few. So it's completely appropriate to have this as an optional switch from the current default behavior. Those that will be running spatial-correlation analyses for the forthcoming dataset will certainly be experienced and in-the-know enough to be aware of this new optional "speed switch", so I don't foresee any of us needlessly wasting cluster time. Let's have a discussion about this on Monday's DWG call, and we should certainly get views from enterprise architects like @vallis during the discussions.
As someone who watches this development from the sidelines, I definitely agree with Steve that this should not be made the default calculation right now. Optional, yes, and it should even possibly be used for much of our analyses. But we will need to convince all of ourselves and others that it is numerically equivalent in the long-run (although the tests that you have described certainly sound promising).
Thanks Ken and Aaron, this is very careful work and the performance is indeed impressive. Steve and Scott have commented already on whether this should be the new default, and I largely agree with them. This does not mean that your code cannot be used already for an important fraction of DWG runs!
Instead I'd like to address the structure of the computation itself, which creates a large number of new methods within SignalCollection, a new classification of Signals, a separate treatment of the "chi" matrix which largely duplicates the existing "phi", and I must be forgetting something else. I'm concerned that all these will weaken the modularity and extensibility of the core library.
Looking at your code, I think there is a way to leverage the existing infrastructure and hide the majority of new tensor objects inside a new Signal object, without burdening the standard classes. If you give me a few days I will sketch it out, and then we may be able to refactor your code quickly.
Three more things:
I really don't mean to be negative, and I look forward to including your code in a way that gets us awesome performance and empowers future users and developers to build on Enterprise.
Merging #286 (679ed33) into master (dfb6d88) will decrease coverage by
0.14%
. The diff coverage is86.19%
.
@@ Coverage Diff @@
## master #286 +/- ##
==========================================
- Coverage 86.44% 86.29% -0.15%
==========================================
Files 12 12
Lines 2730 3138 +408
==========================================
+ Hits 2360 2708 +348
- Misses 370 430 +60
Impacted Files | Coverage Δ | |
---|---|---|
enterprise/signals/signal_base.py | 87.93% <86.16%> (-1.48%) |
:arrow_down: |
enterprise/signals/gp_signals.py | 86.78% <100.00%> (+0.03%) |
:arrow_up: |
Continue to review full report at Codecov.
Legend - Click here to learn more
Δ = absolute <relative> (impact)
,ø = not affected
,? = missing data
Powered by Codecov. Last update dfb6d88...679ed33. Read the comment docs.
Michele, the reason there's so much duplicated code in what I wrote is that I was trying to keep the existing code intact, so people can compare the old and new results in their applications. In the long run, I think we should get rid of the old LogLikelihood class and the signal collections methods that only it uses, and have an option to the new class to decide which procedure to use. (Perhaps we could call it "separate_timing_models".) The old algorithm is all there in the new code. If you don't treat any signals as timing model signals, you will get the old computation.
But if you have an idea for a cleaner organization, by all means let us talk about it.
The question of what people are going to use is much more important than the question of defaults. Our hope is that people will use the new code for analyses where it is useful. If people have concerns about reliability, we are happy to do further testing to assuage them.
I agree with you about caching mechanisms. If you know how to use the existing caching mechanism for caching FDr and FDF, I would be delighted.
I don't have any particular opinion about which file should have CompareLogLikelihood, but I do think that it should be kept very straightforward to use it instead of one of the other classes, so that people are encouraged to do this comparison on a wide range of test cases.
I have never found a case where the new code benefits from sparse Cholesky decomposition. In the old way of doing things, I think you might get something like 20% improvement, but I have not studied it. It probably depends on what processer you're using and how much optimization is available there.
But I don't want to change the old code because I don't want any uncertainty about whether comparisons are really being made to the old result. If we follow my suggestion above to use the new code to perform the old computation, this will happen automatically when you set cholesky_sparse.
Ken
Subsumed by #288, which repackaged this algebra and code with a different class structure.
Contains faster likelihood calculation as presented by Xavi at the Spring NANOGrav meeting.
This faster likelihood calculation:
This pull request adds new class methods to make this new class work, renames the current likelihood calculation class to OldLogLikelihood, and sets the LogLikelihood class containing the new calculation to be the default method of computation. Additionally, a CompareLogLikelihood class is added to allow quick comparison of speed and accuracy between the two methods of likelihood calculation.