Referee report from ApJ

kazewong commented 1 year ago

Reviewer:

Firstly, I would like to say what an awesome result this is, and what a pleasure it was to read the manuscript! The authors demonstrate end-to-end parameter estimation on CBC gravitational-wave signals on the order of a minute. Their approach is novel, incorporating gradient-based MCMC with normalizing flows for both fast and efficient sampling. While other groups have used similar ideas, one of the key ways in which this work differs is that it does not require pre-training of the normalizing flow network.

I have a few relatively minor points which I would like to see addressed.

[x] In the sentence beginning on line 49: "This expense precludes ...obtaining results in low latency to inform astronomers for potential followup in real time." I don't quite agree with this, as Bayestar produces posterior PDFs of source sky location in very low latency. I suggest removing this line, or clarifying if the authors mean producing something like unapproximated posterior samples is difficult in low latency.
[x] Line 540 and elsewhere: Could the authors say concretely what is meant by phrases like normalizing flows can learn the "global landscape" of the posterior. While I can infer that the normalizing flow produces a smooth representation of the target distribution, it would be worth commenting on the quantitative differences between the NF representation and the target distribution.
[x] Line 548: I was intrigued by "On an Nvidia A100 GPU, we can evaluate the waveform model O(10^9) times in a second for different frequencies or source parameters". I'm assuming that this does not mean 10^9 full Fourier series, but the waveform model evaluated at 10^9 frequency bins? Assuming that's correct, then my back of the envelope calculation suggests that on an A100, you could evaluate around 10^6 full waveforms on the order of 100s, which is roughly the number of iterations of a nested sampler or "standard" MCMC used by LIGO. (assume 10^5 frequency bins per waveform and 10^9 bins per second gives 10^6 waveforms in around 100s). Can the authors please comment on whether this is a reasonable estimate. If it is, then could a brute force approach to PE be to just evaluate waveforms on really fast GPUs?
[x] Section 3: Could the authors clarify exactly what hardware was used to run the analysis? And if the CPU time differs from the wall time?
[x] Line 648: The timing comparison to bilby does not take into account ROQ-likelihood acceleration which have been used for PhenomD/P models in LIGO. Could the authors comment on how the run time is reduced when ROQs are used? I think this is a relevant comparison because the baseline will not always be the absolute slowest possible runs.
[x] Line 634: In addition to the wall time of the full analysis, it would be useful to know additional run statistics, such as how efficient the proposal distributions are, and how many iterations/likelihood calls are made.
[x] Around Line 806: In addition to handling the Earth's rotation in future detectors, I'm also curious to know how your method might perform on multiple overlapping signals. If the normalizing flow can learn a single target distribution, the presumably it can learn several disjoint ones at once, so I would naively think it should scale well for PE on multiple signals.
[x] General comment: in addition to the discussed extensions to waveform models like XPHM and NrSur, could the method also be applied to EOB models? These are usually the most expensive and are less amenable to ROM/ROQ because they require solving time-dependent ODEs.

Data Editor's review : One of our data editors has reviewed your initial manuscript submission and has the following suggestion(s) to help improve the data, software citation and/or overall content. Please treat this as you would a reviewer's comments and respond accordingly in your report to the science editor. Questions can be sent directly to the data editors at data-editors@aas.org.

[x] Per the AAS software policy, https://journals.aas.org/policy-statement-on-software/, the authors should modify their AASTeX v6+ manuscript to highlight the code they used (both cited and unmentioned in the current text) with the new \software command, e.g.

\software{scipy (Virtanen et al. 2020), flowMC (Wong et al. 2022; Gabrié et al. 2022), ripple package (Edwards et al. in prep.), pBilby (Smith et al. 645 2020)}

[x] Per our software policy [1], we recommend that authors with living code in a GitHub repository place a "frozen" version on Zenodo (or other 3rd party repository that issues DOIs) [2,3] and then cite the deposited version in the article [4] and it in the reference list.

Please pay close attention to the author list for the DOI deposit and that both the GitHub and DOI deposit have valid and consistent software licenses.

[1] https://journals.aas.org/news/policy-statement-on-software/

[2] https://github.com/AASJournals/Tutorials/tree/master/Repositories

[3] https://guides.github.com/activities/citable-code/

[4] https://journals.aas.org/aastexguide/#softwareandthirdparty

[x] We recommend that manuscripts that use Show your work include some text describing the underlying philosophy of Show your work and how to use the associated icons with each figure. We also recommend that all of the data and code associated with each figure be placed in a Zenodo repository and linked via its DOI. The reason for this is that the github links can be fragile and thus it is important to capture the full contents via a DOI to a repository for future readers. Some of this is discussed in this Show your work issue:

https://github.com/showyourwork/showyourwork/pull/256

kazewong commented 1 year ago

@maxisi @tedwards2412 Here is the itemized list of points in our referee report.

kazewong commented 1 year ago

@maxisi @tedwards2412 Finally I have addressed most of the comments from the referee report. The changes are highlighed in red in the script. Please have a look and let me know when it is ready to be shipped back to ApJ!

kazewong / TurboPE

Referee report from ApJ #4