openjournals / joss-reviews

Reviews for the Journal of Open Source Software
Creative Commons Zero v1.0 Universal
704 stars 37 forks source link

[REVIEW]: PyGRB: A pure python gamma-ray burst analysis package. #2536

Closed whedon closed 3 years ago

whedon commented 4 years ago

Submitting author: @JamesPaynter (James Paynter) Repository: https://github.com/JamesPaynter/PyGRB Version: v1.0.2 Editor: @xuanxu Reviewer: @zingale, @grburgess Archive: 10.5281/zenodo.4031843

:warning: JOSS reduced service mode :warning:

Due to the challenges of the COVID-19 pandemic, JOSS is currently operating in a "reduced service mode". You can read more about what that means in our blog post.

Status

status

Status badge code:

HTML: <a href="https://joss.theoj.org/papers/8aff0347e6993ec23b060052a80aaaa0"><img src="https://joss.theoj.org/papers/8aff0347e6993ec23b060052a80aaaa0/status.svg"></a>
Markdown: [![status](https://joss.theoj.org/papers/8aff0347e6993ec23b060052a80aaaa0/status.svg)](https://joss.theoj.org/papers/8aff0347e6993ec23b060052a80aaaa0)

Reviewers and authors:

Please avoid lengthy details of difficulties in the review thread. Instead, please create a new issue in the target repository and link to those issues (especially acceptance-blockers) by leaving comments in the review thread below. (For completists: if the target issue tracker is also on GitHub, linking the review thread in the issue or vice versa will create corresponding breadcrumb trails in the link target.)

Reviewer instructions & questions

@zingale & @grburgess, please carry out your review in this issue by updating the checklist below. If you cannot edit the checklist please:

  1. Make sure you're logged in to your GitHub account
  2. Be sure to accept the invite at this URL: https://github.com/openjournals/joss-reviews/invitations

The reviewer guidelines are available here: https://joss.readthedocs.io/en/latest/reviewer_guidelines.html. Any questions/concerns please let @xuanxu know.

Please start on your review when you are able, and be sure to complete your review in the next six weeks, at the very latest

Review checklist for @zingale

Conflict of interest

Code of Conduct

General checks

Functionality

Documentation

Software paper

Review checklist for @grburgess

Conflict of interest

Code of Conduct

General checks

Functionality

Documentation

Software paper

whedon commented 4 years ago

Hello human, I'm @whedon, a robot that can help you with some common editorial tasks. @zingale, @grburgess it looks like you're currently assigned to review this paper :tada:.

:warning: JOSS reduced service mode :warning:

Due to the challenges of the COVID-19 pandemic, JOSS is currently operating in a "reduced service mode". You can read more about what that means in our blog post.

:star: Important :star:

If you haven't already, you should seriously consider unsubscribing from GitHub notifications for this (https://github.com/openjournals/joss-reviews) repository. As a reviewer, you're probably currently watching this repository which means for GitHub's default behaviour you will receive notifications (emails) for all reviews 😿

To fix this do the following two things:

  1. Set yourself as 'Not watching' https://github.com/openjournals/joss-reviews:

watching

  1. You may also like to change your default settings for this watching repositories in your GitHub profile here: https://github.com/settings/notifications

notifications

For a list of things I can do to help you, just type:

@whedon commands

For example, to regenerate the paper pdf after making changes in the paper's md or bib files, type:

@whedon generate pdf
whedon commented 4 years ago
Reference check summary:

OK DOIs

- 10.1086/176902 is OK
- 10.1086/430294 is OK
- 10.1051/0004-6361/201322068 is OK
- 10.3847/1538-4365/ab06fc is OK
- 10.1093/mnras/staa278 is OK
- 10.1111/j.1365-2966.2009.14548.x is OK
- 10.1063/1.1835238 is OK
- 10.1111/j.1365-2966.2007.12353.x is OK
- 10.1086/184740 is OK
- 10.1086/186399 is OK
- 10.1086/186344 is OK

MISSING DOIs

- None

INVALID DOIs

- None
whedon commented 4 years ago

:point_right: Check article proof :page_facing_up: :point_left:

xuanxu commented 4 years ago

@zingale, @grburgess, @JamesPaynter: this is the review thread for the paper. All of our communications will happen here from now on.

Both reviewers have checklists at the top of this thread with the JOSS requirements. As you go over the submission, please check any items that you feel have been satisfied. There are also links to the JOSS reviewer guidelines.

The JOSS review is different from most other journals. Our goal is to work with the authors to help them meet our criteria instead of merely passing judgment on the submission. As such, the reviewers are encouraged to submit issues and pull requests on the software repository. When doing so, please mention openjournals/joss-reviews#2536 so that a link is created to this thread (and I can keep an eye on what is happening). Please also feel free to comment and ask questions on this thread. In my experience, it is better to post comments/questions/suggestions as you come across them instead of waiting until you've reviewed the entire package.

We aim for reviews to be completed within about 2-4 weeks, but that's not a hard deadline. Please let me know if any of you require some more time. We can also use Whedon (our bot) to set automatic reminders if you know you'll be away for a known period of time.

Please feel free to ping me (@xuanxu) if you have any questions/concerns.

grburgess commented 4 years ago

@xuanxu do we post questions about the code to the author here?

xuanxu commented 4 years ago

@grburgess Sure, you can use this issue to discuss anything with the author.

grburgess commented 4 years ago

@JamesPaynter A quick run through the code gives the impression that everything is implemented in a statistically proper manner. I have question about how you deal with rates across the BATSE detectors. It appears you are summing them together on a per channel basis, is my impression correct?

JamesPaynter commented 4 years ago

@grburgess This release (v1.0.1) of the package is aimed at using the pre-binned bfits files that the NASA team creates for each burst after they occur. The available bfits light-curves are generally high enough (time) resolution to perform adequate pulse-fitting. I think they have a slightly more sophisticated algorithm involving the detector response matrices for adding data from multiple detectors, but this is something that I am working through and trying to understand at the moment.

In the cases where I look at the individual photon arrival time data (time tagged event / tte), then I did just naively sum over the detectors. If you know of any literature / tutorials that you could point me towards relating to this, that would greatly help my understanding (My supervisor works in another field and I have been teaching myself - so there are some gaps). Particularly, creating a spectra from stte (spectral) data is currently doing my head in 🤯

But for the main purpose of PyGRB which is pulse-fitting of light-curves, using the pre-binned data is sufficient in ~99% of cases.

zingale commented 4 years ago

The JOSS guidelines say:

As a rule of thumb, JOSS’ minimum allowable contribution should represent not less than three months of work for an individual.

the git history shows that the initial commit was ~ 2 months ago. The SLOC is about 3.5k lines.

JamesPaynter commented 4 years ago

@zingale I have been working on this for ~2 years. When I was a young and naive programmer I accidentally committed a large amount of files and even after deleting them my repo was ~100mb. So I fiddled around with my git history and eventually wiped it to get the repo to a reasonable size.

grburgess commented 4 years ago

@JamesPaynter Thanks for the update! There is actually no way to sum together light curves in count space as the signal from each detector has passed the response and has its own background. In order to model the latent (true) energy-(in)dependent light curve, one of need to fold an energy dependent, spectrally -evolving model through all detectors simultaneously, compare the individual count rates to each of these signals and sum their log-likelihoods. Otherwise, the result you obtain will have little to do with the incoming signal.

This is an issue with many of the attempts to look for lensed GRBs via detectors that experience a lot of energy dispersion. The observed signal is a very dependent on the local background, incident angle of the GRB, etc. Modeling these effects in the signal space and performing the likelihood in separate data spaces is the only proper way to get around this and will increase the sensitivity of the search. This is however, very computationally expensive. I've written code to do this for GBM, and there are physical background models available (https://www.aanda.org/articles/aa/full_html/2020/08/aa37347-19/aa37347-19.html). The issue of dealing with responses is also highlighted in this article and may give you some insight into how to do this in your code. I'm very happy to help in anyway possible on this problem.

That being said, neglecting the summing of the detectors, everything else seems fine. I have a few more installation checks to do this afternoon, but I would think that your approach is fine on a single detector basis.

It would also help if you could detail in the documentation things like how the background is modeled, what likelihood you are using, etc. I was able to understand that by going through the code, but I knew what I was looking for. A more explicit explanation would help users.

JamesPaynter commented 4 years ago

@grburgess That is very interesting! I know you are working mostly with Fermi, but do you know / can you comment if that (spectral-folding) would have been done by the BATSE team during the creation of the bfits light-curves? For example, when I was first playing with the time-tagged photon data for a burst I am interested in, I binned it (summing 3 triggered detectors) with the time bins of the corresponding bfits light-curve. The difference between the two light-curves was 2 counts in about 8,000, so I thought "not bad! this must be how they do it". No doubt there has been a lot of progress in the last 20-30 years though.

And that is the case with present day (Fermi) analysis? Light-curves are created from count data with a spectral model from the get-go? I was wondering recently if it was possible to create a generalised analysis tool that has one model and uses data from various detectors - but it looks like you are already working on something similar in 3ML. I had wanted to simultaneously use data from BATSE's spectral detectors and large area detectors to create a spectrally evolving light-curve fit. But I never came across a spectrally evolving light-curve model in the literature.

I find that extremely interesting, did you publish any results from your gravitational lensing study? I have not seen anything in the literature regarding detector effects, but something I have wondered is how different a GRB could be depending on when / how you observe it. Eg. through the Earth's atmosphere, angle(s) of incidence to detectors, number of triggered detectors etc. To my knowledge all authors so far have assumed the light-curves to be statistically identical within Poisson noise. (Myself included, but only for lack of a more sophisticated modelling capability). It would be an interesting exercise to take a pulse model and see how different you can make at-detector realisations of it.

I read your article cover to cover. It is a really nice analysis and I found it very instructive, particularly the step by step addition of background effects and the discussion of the SAA. I have 15 seconds of 256 channel tte data which I think is similar to Fermi data. The first 500ms contains two pulses which I am alleging are a gravitationally lensed pair. Since the two pulses are so close together in time, the angular response and backgrounds should be the same. A reviewer has (rightly) asked us to compare the spectra of the two pulses. Your paper mentions that the quick and dirty way to create a spectra for transient events is to fit smooth polynomial functions to each channel for off source times, and subtract the background from on source times. I think (hope) this would be sufficient in the above case. The part that I am unsure of is:

  1. Combining the data from the three triggered detectors, which from what I now understand will require a spectral model and the detector response matrices(?). Eg. I am not quite sure how to interpret / use Figure 5. in your paper.
  2. Dealing with channels with low counts or negative counts after background subtraction.
  3. How does a cheat normalise for effective collecting area? Figure 4. of your paper illustrates this quite well, but I think it is a bit more sophisticated than I can achieve here. To first order, is the background uniform? i.e. is the effective area for the background (for normalisation) approximately A_detector, whereas the effective area for the burst is more like A cos theta_burst. Or are the drms really needed to do this?
  4. Or do you think I am better off following the model fitting approach you use? In which case, could you please point me towards an appropriate signal model and priors for combining the data across multiple detectors and drms? I could not piece this together from your paper or gbm_drm_gen on GitHub.

I'll make a note in the documentation regarding the (constant) background model and likelihood function. I note your criticism of polynomial background fits in your paper, but I think that for most BATSE bursts polynomial is probably sufficient given the data quality compared to Fermi. (As the lower data quality precludes the scale of background modelling that you achieved with Fermi).

Thanks for your help! Sorry for the essay! James

JamesPaynter commented 4 years ago

Looking at your tutorials, I am at the stage of # We can check the count spectrum of the detectors, e.g. det_sl['n5'].view_count_spectrum(); in this tutorial https://github.com/grburgess/gbm_drm_gen/blob/master/examples/tte_balrog_tutorial.ipynb with three triggered detectors.

grburgess commented 4 years ago

So, I am not sure how to divide this up into parts that are constructive for the repository at hand.

You can think of both BATSE and GBM as the same detector technology as both are non-imagining spectrometers suffering from significant energy dispersion. Thus, one must always model in latent space, fold through the detectors, etc to combine information across detectors. If you want to work in a single detector, it is fine to ignore this as long as one realizes what these data actually are.

I do not have a gravitational lens study (I'm rather dubious as to if it is possible) but I do not know of any studies that properly deal with the data. Pulse modeling as you are doing is a good start. Most studies use cross correlation and I think this has a multitude of both model and statistical problems for the data at hand.

I want to emphasize that background is never subtracted. It is jointly modeled. Subtraction of Poisson rates leads to all kinds of issues, including when the number of counts is low.

1) Exactly, I have written a tutorial on this here: https://grburgess.github.io/fits_files/ I think you will understand the figure after reading that.

2) There is a myth around low counts that comes from the use of Gaussian likelihoods with Poisson data. Zero counts in a Poisson likelihood is just as informative as 10^3 counts. However, when you write this with the Gaussian "approximation" the MLE estimator for the sigma term gives one the impression that more counts => more information. Your use of a Poisson likelihood is great and appropriate in your code.

3) This is a normal practice in x-ray spectral modeling to deal with total flux calibration. The idea is that there is likely some uncertainty in the responses total effective area. Thus, we can multiply the response (or the predicted counts) of one detector by a constant. This is different than the normalization of the spectral model as it only affects the counts in one detector. This allows for one to correct for unmodeled differences in the total area of the detectors. However, it does not correct for energy dependent issues.

4) I think this goes beyond the scope of the work you have here. I'm not sure about the process (@xuanxu ) and do not want to impede the refereeing for your current code. But I would be happy to have some offline discussions with you and my research group about ways to proceed if you want to go this route.

grburgess commented 4 years ago

ah sorry, i'm going to move this to an issue in the target repo.

JamesPaynter commented 4 years ago

Sorry for going off-topic! I can shoot you an email if that is a better place to talk? I would greatly appreciate some help in this regard (point 4). But I will have a look through your tutorial and work over your other suggestions so we can have a more informed discussion.

As for the current code, I am happy with it at the moment. I am just always thinking of improvements. Thanks again!

grburgess commented 4 years ago

email is fine!

zingale commented 4 years ago

I'm happy with this once the contributor guidelines are updated (issue tagged above ^).

I'll leave the technical details about working with GRB data to the other reviewer, as that is not my expertise.

JamesPaynter commented 4 years ago

@grburgess "It would also help if you could detail in the documentation things like how the background is modeled, what likelihood you are using, etc"

I updated the documentation (priors, background, likelihood, rates), but I wasn't sure how much information is encapsulated by the 'etc'. https://pygrb.readthedocs.io/en/latest/user/sampling.html

grburgess commented 4 years ago

docs look great now

zingale commented 4 years ago

I didn't test the automated tests, but I see there is a travis file, and a test coverage link and tests/ directory. So I am happy with that.

zingale commented 4 years ago

I'm done with my review. All good by me.

xuanxu commented 4 years ago

@grburgess / @JamesPaynter: Have you been able to move forward with the pending review issues?

JamesPaynter commented 4 years ago

@xuanxu I think there is just one last issue to close off re: instrumental dead time / using proper count data. I must admit that I have been meaning to chase this up with @grburgess but I got distracted by another paper's referee feedback that I am addressing.

grburgess commented 4 years ago

Yes, I was just waiting to see what the plans were to deal with a few of the issues I raised. Especially re: summing of detectors.

JamesPaynter commented 4 years ago

Should we continue this discussion on my GitHub issues?

re: detector summing, is it then appropriate to have a radiation field which is convolved / folded with the triggered detector response matrices to then compare to observed counts in the Poisson likelihood? This would work for the tte data.

I'm still not sure of a way forward using the main available datatypes (eg discsc 64ms binned data) as my understanding is they come pre-summed over triggered detectors (prebinned on the satellite). There is not tte data available to cover the entire time history of ~80-90% of BATSE GRBs, only lower resolution eg. discsc data.

grburgess commented 3 years ago

Once a caveat about the summing of detectors is either stated in the docs or fixed in the code, I'm ok with this being published. Nice work @JamesPaynter .

xuanxu commented 3 years ago

Thanks @grburgess! (Please check all the items from your checklist you are already satisfied with)

JamesPaynter commented 3 years ago

Thank you @grburgess 😄

I updated the docs in the sampling section to reflect this: https://pygrb.readthedocs.io/en/latest/user/sampling.html

xuanxu commented 3 years ago

@whedon generate pdf

whedon commented 3 years ago

:point_right::page_facing_up: Download article proof :page_facing_up: View article proof on GitHub :page_facing_up: :point_left:

xuanxu commented 3 years ago

@whedon check references

whedon commented 3 years ago
Reference check summary (note 'MISSING' DOIs are suggestions that need verification):

OK DOIs

- 10.1086/176902 is OK
- 10.1086/430294 is OK
- 10.1051/0004-6361/201322068 is OK
- 10.3847/1538-4365/ab06fc is OK
- 10.1093/mnras/staa278 is OK
- 10.1111/j.1365-2966.2009.14548.x is OK
- 10.1063/1.1835238 is OK
- 10.1111/j.1365-2966.2007.12353.x is OK
- 10.1086/184740 is OK
- 10.1086/186399 is OK
- 10.1086/186344 is OK

MISSING DOIs

- None

INVALID DOIs

- None
xuanxu commented 3 years ago

OK @JamesPaynter, everything looks good now, here are the next steps:

Once you do that please report here the version number and archive DOI

JamesPaynter commented 3 years ago

Thanks! :)

New version v1.0.2

DOI https://doi.org/10.5281/zenodo.4031843

I linked my ORCID to the Zenodo account, is there more to do to link it to the JOSS paper?

https://orcid.org/0000-0002-9672-4008

xuanxu commented 3 years ago

is there more to do to link it to the JOSS paper?

It's already in the metadata of the paper

xuanxu commented 3 years ago

@whedon set v1.0.2 as version

whedon commented 3 years ago

OK. v1.0.2 is the version.

xuanxu commented 3 years ago

@whedon set 10.5281/zenodo.4031843 as archive

whedon commented 3 years ago

OK. 10.5281/zenodo.4031843 is the archive.

xuanxu commented 3 years ago

@JamesPaynter A last change: The title was missing a capitalization. Please take a look at the PR I just opened in your repo, and change also the title in the zenodo page.

JamesPaynter commented 3 years ago

Merged and updated in Zenodo :)

xuanxu commented 3 years ago

@whedon accept

whedon commented 3 years ago
Attempting dry run of processing paper acceptance...
whedon commented 3 years ago
Reference check summary (note 'MISSING' DOIs are suggestions that need verification):

OK DOIs

- 10.1086/176902 is OK
- 10.1086/430294 is OK
- 10.1051/0004-6361/201322068 is OK
- 10.3847/1538-4365/ab06fc is OK
- 10.1093/mnras/staa278 is OK
- 10.1111/j.1365-2966.2009.14548.x is OK
- 10.1063/1.1835238 is OK
- 10.1111/j.1365-2966.2007.12353.x is OK
- 10.1086/184740 is OK
- 10.1086/186399 is OK
- 10.1086/186344 is OK

MISSING DOIs

- None

INVALID DOIs

- None
whedon commented 3 years ago

:wave: @openjournals/joss-eics, this paper is ready to be accepted and published.

Check final proof :point_right: https://github.com/openjournals/joss-papers/pull/1730

If the paper PDF and Crossref deposit XML look good in https://github.com/openjournals/joss-papers/pull/1730, then you can now move forward with accepting the submission by compiling again with the flag deposit=true e.g.

@whedon accept deposit=true
kyleniemeyer commented 3 years ago

Hi @JamesPaynter, just doing some final checks.

Actually, it looks like many/most of the references are missing the journal name. I think the problem is that your .bib file uses \apj and \aap for these, but that won't work with our system.

xuanxu commented 3 years ago

FITS is a standard data format for astrophysics catalogs, widely used in astronomy so I don't think there's need to explain it.

+1 to all other changes

JamesPaynter commented 3 years ago

Hi @kyleniemeyer , sorry about that, everything is fixed now :)

Thanks @xuanxu for the PR

kyleniemeyer commented 3 years ago

@whedon generate pdf