[REVIEW]: popsynth: A generic astrophysical population synthesis framework

openjournals / joss-reviews

Reviews for the Journal of Open Source Software

Creative Commons Zero v1.0 Universal

725 stars 38 forks source link

[REVIEW]: popsynth: A generic astrophysical population synthesis framework #3257

Closed whedon closed 3 years ago

whedon commented 3 years ago

Submitting author: @grburgess (J. Michael Burgess) Repository: https://github.com/grburgess/popsynth Version: v1.0.3 Editor: @xuanxu Reviewer: @HeloiseS, @warrickball Archive: 10.5281/zenodo.5109590

:warning: JOSS reduced service mode :warning:

Due to the challenges of the COVID-19 pandemic, JOSS is currently operating in a "reduced service mode". You can read more about what that means in our blog post.

Status

Status badge code:

HTML: <a href="https://joss.theoj.org/papers/a52e4c2c355396e7946917996502aac0"><img src="https://joss.theoj.org/papers/a52e4c2c355396e7946917996502aac0/status.svg"></a>
Markdown: [![status](https://joss.theoj.org/papers/a52e4c2c355396e7946917996502aac0/status.svg)](https://joss.theoj.org/papers/a52e4c2c355396e7946917996502aac0)

Reviewers and authors:

Please avoid lengthy details of difficulties in the review thread. Instead, please create a new issue in the target repository and link to those issues (especially acceptance-blockers) by leaving comments in the review thread below. (For completists: if the target issue tracker is also on GitHub, linking the review thread in the issue or vice versa will create corresponding breadcrumb trails in the link target.)

Reviewer instructions & questions

@HeloiseS & @warrickball, please carry out your review in this issue by updating the checklist below. If you cannot edit the checklist please:

Make sure you're logged in to your GitHub account
Be sure to accept the invite at this URL: https://github.com/openjournals/joss-reviews/invitations

The reviewer guidelines are available here: https://joss.readthedocs.io/en/latest/reviewer_guidelines.html. Any questions/concerns please let @xuanxu know.

✨ Please start on your review when you are able, and be sure to complete your review in the next six weeks, at the very latest ✨

Review checklist for @HeloiseS

Conflict of interest

[x] I confirm that I have read the JOSS conflict of interest (COI) policy and that: I have no COIs with reviewing this work or that any perceived COIs have been waived by JOSS for the purpose of this review.

Code of Conduct

[x] I confirm that I read and will adhere to the JOSS code of conduct.

General checks

[x] Repository: Is the source code for this software available at the repository url?
[x] License: Does the repository contain a plain-text LICENSE file with the contents of an OSI approved software license?
[x] Contribution and authorship: Has the submitting author (@grburgess) made major contributions to the software? Does the full list of paper authors seem appropriate and complete?
[x] Substantial scholarly effort: Does this submission meet the scope eligibility described in the JOSS guidelines

Functionality

[x] Installation: Does installation proceed as outlined in the documentation?
[x] Functionality: Have the functional claims of the software been confirmed?
[x] Performance: If there are any performance claims of the software, have they been confirmed? (If there are no claims, please check off this item.)

Documentation

[x] A statement of need: Do the authors clearly state what problems the software is designed to solve and who the target audience is?
[x] Installation instructions: Is there a clearly-stated list of dependencies? Ideally these should be handled with an automated package management solution.
[x] Example usage: Do the authors include examples of how to use the software (ideally to solve real-world analysis problems).
[x] Functionality documentation: Is the core functionality of the software documented to a satisfactory level (e.g., API method documentation)?
[x] Automated tests: Are there automated tests or manual steps described so that the functionality of the software can be verified?
[x] Community guidelines: Are there clear guidelines for third parties wishing to 1) Contribute to the software 2) Report issues or problems with the software 3) Seek support

Software paper

[x] Summary: Has a clear description of the high-level functionality and purpose of the software for a diverse, non-specialist audience been provided?
[x] A statement of need: Does the paper have a section titled 'Statement of Need' that clearly states what problems the software is designed to solve and who the target audience is?
[x] State of the field: Do the authors describe how this software compares to other commonly-used packages?
[x] Quality of writing: Is the paper well written (i.e., it does not require editing for structure, language, or writing quality)?
[x] References: Is the list of references complete, and is everything cited appropriately that should be cited (e.g., papers, datasets, software)? Do references in the text use the proper citation syntax?

Review checklist for @warrickball

Conflict of interest

[x] I confirm that I have read the JOSS conflict of interest (COI) policy and that: I have no COIs with reviewing this work or that any perceived COIs have been waived by JOSS for the purpose of this review.

Code of Conduct

[x] I confirm that I read and will adhere to the JOSS code of conduct.

General checks

[x] Repository: Is the source code for this software available at the repository url?
[x] License: Does the repository contain a plain-text LICENSE file with the contents of an OSI approved software license?
[x] Contribution and authorship: Has the submitting author (@grburgess) made major contributions to the software? Does the full list of paper authors seem appropriate and complete?
[x] Substantial scholarly effort: Does this submission meet the scope eligibility described in the JOSS guidelines

Functionality

[x] Installation: Does installation proceed as outlined in the documentation?
[x] Functionality: Have the functional claims of the software been confirmed?
[x] Performance: If there are any performance claims of the software, have they been confirmed? (If there are no claims, please check off this item.)

Documentation

[x] A statement of need: Do the authors clearly state what problems the software is designed to solve and who the target audience is?
[x] Installation instructions: Is there a clearly-stated list of dependencies? Ideally these should be handled with an automated package management solution.
[x] Example usage: Do the authors include examples of how to use the software (ideally to solve real-world analysis problems).
[x] Functionality documentation: Is the core functionality of the software documented to a satisfactory level (e.g., API method documentation)?
[x] Automated tests: Are there automated tests or manual steps described so that the functionality of the software can be verified?
[x] Community guidelines: Are there clear guidelines for third parties wishing to 1) Contribute to the software 2) Report issues or problems with the software 3) Seek support

Software paper

[x] Summary: Has a clear description of the high-level functionality and purpose of the software for a diverse, non-specialist audience been provided?
[x] A statement of need: Does the paper have a section titled 'Statement of Need' that clearly states what problems the software is designed to solve and who the target audience is?
[x] State of the field: Do the authors describe how this software compares to other commonly-used packages?
[x] Quality of writing: Is the paper well written (i.e., it does not require editing for structure, language, or writing quality)?
[x] References: Is the list of references complete, and is everything cited appropriately that should be cited (e.g., papers, datasets, software)? Do references in the text use the proper citation syntax?

whedon commented 3 years ago

Hello human, I'm @whedon, a robot that can help you with some common editorial tasks. @HeloiseS, @warrickball it looks like you're currently assigned to review this paper :tada:.

:warning: JOSS reduced service mode :warning:

Due to the challenges of the COVID-19 pandemic, JOSS is currently operating in a "reduced service mode". You can read more about what that means in our blog post.

:star: Important :star:

If you haven't already, you should seriously consider unsubscribing from GitHub notifications for this (https://github.com/openjournals/joss-reviews) repository. As a reviewer, you're probably currently watching this repository which means for GitHub's default behaviour you will receive notifications (emails) for all reviews 😿

To fix this do the following two things:

Set yourself as 'Not watching' https://github.com/openjournals/joss-reviews:

watching

You may also like to change your default settings for this watching repositories in your GitHub profile here: https://github.com/settings/notifications

notifications

For a list of things I can do to help you, just type:

@whedon commands

For example, to regenerate the paper pdf after making changes in the paper's md or bib files, type:

@whedon generate pdf

whedon commented 3 years ago

Software report (experimental):

github.com/AlDanial/cloc v 1.88  T=0.43 s (280.9 files/s, 35792.3 lines/s)
-------------------------------------------------------------------------------
Language                     files          blank        comment           code
-------------------------------------------------------------------------------
Python                          58           2986           2240           5392
CSS                             36             31             77           2590
Markdown                         8            451              0            913
YAML                             8             47             37            240
TeX                              1              5              0             52
reStructuredText                 6             40             83             28
JavaScript                       1              1              8             26
HTML                             2              0             25             20
-------------------------------------------------------------------------------
SUM:                           120           3561           2470           9261
-------------------------------------------------------------------------------

Statistical information for the repository 'a418e67afb0e014c424e664e' was
gathered on 2021/05/06.
The following historical commit information, by author, was found:

Author                     Commits    Insertions      Deletions    % of changes
Francesca Capel                  8           395            212            2.15
J. Michael Burgess             198          8182           3644           41.96
grburgess                      246         10861           4890           55.89

Below are the number of rows from each author that have survived and are still
intact in the current revision:

Author                     Rows      Stability          Age       % in comments
Francesca Capel             284           71.9          1.9                6.69
J. Michael Burgess        10369          126.7         12.4               10.75

whedon commented 3 years ago

Reference check summary (note 'MISSING' DOIs are suggestions that need verification):

OK DOIs

- 10.1103/physrevd.100.103523 is OK

MISSING DOIs

- None

INVALID DOIs

- None

whedon commented 3 years ago

:point_right::page_facing_up: Download article proof :page_facing_up: View article proof on GitHub :page_facing_up: :point_left:

warrickball commented 3 years ago

Hi,

I've just made a first pass at reviewing the repository. I'm not an expert on the scientific application but I'm at least an astronomer/astrophysicist and will comment on the repo from that perspective.

First, the code itself seems reasonable overall. I had no problem installing and running it, though the nice-looking 3D plots didn't work in a script or IPython console. (I don't personally use Jupyter notebooks much.) The code certainly appears to represent significant and useful work. As noted in the paper, other population synthesis codes tend to focus on particular kinds of astrophysical object, whereas popsynth is clearly meant to be abstract enough to apply to many different types of object.

My principal concern is the documentation. For a start, it appears incomplete. The API docs linked from the left sidebar are empty and there are two links to the same page (API and popsynth). I also came across a FIXME in the docstring of popsynth.ZPowerSphericalPopulation, which it turns out is in docstrings of many of the spatial populations. The experimental YAML documentation in the Quick start also has a sentence

We can load this yaml file into a population synth like this:

which abruptly leads to the next section. (I wouldn't recommend documenting experimental features in the Quick start.)

The Quick start plunges into an abstract and jargon-laden example. So, though it works and the abstraction makes the code usefully versatile, it isn't at all obvious to me how it applies to a real-world problem, though perhaps this would be obvious to someone in the field. I think the docs would be better served by relating its parameters to a realistic problem. e.g. Could the population in the Quick start be thought of as AGN? Similarly, few of the distributions are familiar and I've only seen the equations in the docs produced by the population generator objects' display() function (which, like the 3D plots, didn't work for me in scripts or IPython). It'd be good to have a reference for each equation but this would in effect be done by populating the distributions' docstrings.

On a related note, there are some technical details about what the code aims to do that I don't understand. Are we trying to create a population of events or of sources? I presumed the code was for sources but the text of the paper mentions gamma-ray bursts (GRBs) as examples. I would've thought GRBs are one-off events: once a GRB has been detected from a source, we wouldn't expect to observe another GRB from that source, would we? If I've understood that correctly, it then isn't obvious to me how popsynth simulates a population of GRBs. Perhaps the "source" of a GRB is really its host galaxy? The animated spinning figure on the frontpage of the documentation also gives the impression of simulating a population of events.

I think this would be cleared up if there were some mathematical definitions for what exactly is meant by terms like population, luminosity function, etc. I don't expect these to be prominent: they can go in a glossary or something (though they should perhaps go in the paper too). But without them, it's difficult to know exactly what the code is intended to do.

It's a simple thing but for a non-expert like me, I'd find it easier to understand the docs if the various classes and functions were internally linked (e.g. with ReST :py:class:... directives).

To close this section, you may take my scientific remarks with a pinch of salt. I'm an not an expert and perhaps your target user knows enough that my comments are moot.

The requirements for testing aren't specified anywhere. I cloned the repo and tried pytest in the directory above the tests I found, which revealed that hypothesis is required. With that installed I got an error

popsynth/test/test_selections.py:7: in <module>
    class GalacticPlaceSelection(popsynth.SpatialSelection):
popsynth/test/test_selections.py:11: in GalacticPlaceSelection
    b_limit = popsynth.SelectionParameter(vmin=0, vmax=90)
E   AttributeError: module 'popsynth' has no attribute 'SelectionParameter'

and a bunch of warnings, mostly about "traits". The CI appears to be passing, though, so perhaps I've missed a dependency or run pytest in the wrong place. My platform information (from pytest) is:

platform linux -- Python 3.9.4, pytest-6.2.1, py-1.10.0, pluggy-0.13.1
rootdir: ..., configfile: setup.cfg, testpaths: popsynth/test
plugins: cov-2.10.1, remotedata-0.3.2, hypothesis-6.12.0

Finally, two small points. First, there are no "clear guidelines for third parties wishing to 1) Contribute to the software 2) Report issues or problems with the software 3) Seek support". All of this can be done through GitHub but the repo or docs should say so. (On a related note, the docs don't seem to link to the GitHub repo anywhere.)

Second, the SkyPy and firesong references in the paper render as

al., A. A. et. (2021). SkyPy: A package for modelling the universe. InGitHub repository. GitHub.https://github.com/skypyproject/skypy al., I. T. et. (2021). FIRst extragalactic simulation of neutrino and gamma-ray. InGitHub repository. GitHub.https://github.com/icecube/FIRESONG

suggesting an error in the author lists but these references can be replaced. firesong has a JOSS paper and SkyPy has a citation file that points to recommended DOIs and BibTeX entries on Zenodo.

grburgess commented 3 years ago

Thank you! I will begin to implement these changes, but here are some quick responses:

My principal concern is the documentation. For a start, it appears incomplete. The API docs linked from the left sidebar are empty and there are two links to the same page (API and popsynth). I also came across a FIXME in the docstring of popsynth.ZPowerSphericalPopulation, which it turns out is in docstrings of many of the spatial populations.

Yes, I need to better doc all the included distributions. Many of these are things that I use and it is not meant to have an exhaustive list because users can easily add their own. If needed, I can add others, but others as well as myself have slowly been adding on things in other repos as they are needed.

The experimental YAML documentation in the Quick start also has a sentence We can load this yaml file into a population synth like this: which abruptly leads to the next section. (I wouldn't recommend documenting experimental features in the Quick start.)

Oops. This seems like a commit that got behind. Will fix.

The Quick start plunges into an abstract and jargon-laden example. So, though it works and the abstraction makes the code usefully versatile, it isn't at all obvious to me how it applies to a real-world problem, though perhaps this would be obvious to someone in the field. I think the docs would be better served by relating its parameters to a realistic problem. e.g. Could the population in the Quick start be thought of as AGN? Similarly, few of the distributions are familiar and I've only seen the equations in the docs produced by the population generator objects' display() function (which, like the 3D plots, didn't work for me in scripts or IPython). It'd be good to have a reference for each equation but this would in effect be done by populating the distributions' docstrings.

This is a good point. There were originally equations but looking at other JOSS papers I thought this was too much. I'll gladly add these back in. Sorry about the iPython part, I'm using the Ipython html display capabilities for this stuff, and it is really only meant to be eye-candy and double checking when examining the functions. I think you are right that a more concrete example or changing "population" to "AGN" or "GRB" would make the connection to the abstraction a little easier. I will also add some specific examples.

On a related note, there are some technical details about what the code aims to do that I don't understand. Are we trying to create a population of events or of sources? I presumed the code was for sources but the text of the paper mentions gamma-ray bursts (GRBs) as examples. I would've thought GRBs are one-off events: once a GRB has been detected from a source, we wouldn't expect to observe another GRB from that source, would we? If I've understood that correctly, it then isn't obvious to me how popsynth simulates a population of GRBs. Perhaps the "source" of a GRB is really its host galaxy? The animated spinning figure on the frontpage of the documentation also gives the impression of simulating a population of events.

Yes, this is confusing. The idea original idea was to create a "survey" or an observed population of fluxes sampled from their observation likelihoods. As the project grew and others were using it, the purpose became more abstract because the structure lets you easily generate linked/dependent quantities. I think adding the math behind the idea will clear some of this up. The code was originally intended to test hierarchal Bayesian models that fit a population of observations, but it has grown..

I think this would be cleared up if there were some mathematical definitions for what exactly is meant by terms like population, luminosity function, etc. I don't expect these to be prominent: they can go in a glossary or something (though they should perhaps go in the paper too). But without them, it's difficult to know exactly what the code is intended to do.

Agreed. I will add this back.

It's a simple thing but for a non-expert like me, I'd find it easier to understand the docs if the various classes and functions were internally linked (e.g. with ReST :py:class:... directives).

I agree, I was trying to use auto doc to generate everything in real time with GitHub actions, but this seems to be failing sporadically. I will try and implement your suggestion.

To close this section, you may take my scientific remarks with a pinch of salt. I'm an not an expert and perhaps your target user knows enough that my comments are moot.

Ah ok, seems there might be an issue in the setup.cfg. I may have beat the CI environment into doing this automatically so I will try and get that working locally. I'm actually not a fan of installing hypotheses if I do not need it... but I will add that to the reqs.

Finally, two small points. First, there are no "clear guidelines for third parties wishing to 1) Contribute to the software 2) Report issues or problems with the software 3) Seek support". All of this can be done through GitHub but the repo or docs should say so. (On a related note, the docs don't seem to link to the GitHub repo anywhere.)

Will fix

Second, the SkyPy and firesong references in the paper render as

Will fix this, I see the Firesong JOSS paper was just out today.

whedon commented 3 years ago

:wave: @HeloiseS, please update us on how your review is going (this is an automated reminder).

whedon commented 3 years ago

:wave: @warrickball, please update us on how your review is going (this is an automated reminder).

HeloiseS commented 3 years ago

Hi !

just wanted to check in to say that I never got the notification for this (even on github) I should have sent an email earlier to check everything was working!

I'll be taking a look at this in the next few days but FYI I'm in the middle of starting fellowship applications so I'm doing my best, not just dragging my feet - looking forward to looking at this!

Cheers, Heloise

xuanxu commented 3 years ago

@whedon re-invite HeloiseS as reviewer

whedon commented 3 years ago

OK, the reviewer has been re-invited.

@heloises please accept the invite by clicking this link: https://github.com/openjournals/joss-reviews/invitations

HeloiseS commented 3 years ago

Hi,

I think my comments will echo what the other referee said. First of all congratulations on your code-base! As far as I can see it's very clean and maintained to a high standard of testing.

General comments

My first and main comment is that after reading the paper and gone through the tutorials, I am still not sure what use cases I would apply this code to - I think that the level of abstraction and the heavy use of jargon (that sometimes has multiple meaning across different fields) is a big part of the problem.

What would help would be some real life examples: The other referee mentioned GRBs; my first thought was also transients, both galactic and extra galactic ones. On reading the paper I was imagining that I could give popsynth transient rates (you do mention those at the beginning of the Procedure section) derived from stellar population synthesis codes, add observational biases, take into account my telescope's magnitude limits in specific bands, include an approximation of the milky way disc extinction, add a telescope latitude, a program time (say 3 nights) and BAM! get an estimate of how many hits I can expect.

Having looked at popsynth, I have no idea where to start to make realistic observations/simulations. I would encourage you to make sure your tutorials are clear even to Master's students/ early PhD students. If your code is open source and in JOSS it is intended for the whole community and at the moment the learning curve is way too high for your really nice code to be used by a large population.

Are there use cases out there already demonstrating how this general and abstract API can be used to address specific astronomy problems? You should cite those in the paper :)

Finally I suggest you do a thorough proof-read of the documentation I came across a few typos.

Validating and referencing the science + Addressing Jargon

Distributions

What are the distributions? What do they do? I understand the concept of the spatial distribution, I am not sure what the luminosity function actually is and where it comes from? Is it source specific? Why does it differ from a basic inverse square law (I assume a majority of astro people know their inverse square law so that can be a good start to anchor your explanation in knowledge that most users will share). I think this goes back to adding more explanation so that a wide audience understands the science behind the code and also adding references as mentioned by reviewer 1.

Also I am concerned that I don't see any validation: How do I know these give me realistic simulations? I liked your graphs showing the "selected" and "non-selected" objects - I'd like to see how that compares to other theoretical predictions. Say the feed the rate of supernovae from stellar evolution codes into popsynth and see if you roughly retrieve the number of observed supernovae (given a volume).

Auxiliary Samplers

"This allows you to build up arbitrailiy complex dependencies between parameters which can lead to diverse populations."

=> But why? I need a solid reason to add complexity.

Also I don't understand the justification for the currently implemented Auxiliary Samplers: I do understand the ViewingAngle, but what's the science behind having a log-normal or a log10 Auxiliary Sampler?

Jargon

Here are a couple of examples: What is a SFR-like redshift? What is a Schecter luminosity function? What populations are we synthesysing? Stars? Transients? Observations? All three?

Jupyter notebooks

None of the 3D graphs render for me.

Dependencies

I installed popsynth in a fresh conda env and I was missing a few dependencies to run the jupyter notebooks

jupyterthemes
pygraphviz

That last one straight up failed to install. I am seeing this issue in their GitHub indicating it could be missing some development files.

That is a lot of complex dependency for flowchart and I don't know if it adds much. I would suggest reconsidering which dependencies are absolutely necessary for the good running of the code and removing the rest. If you want to illustrate class inheritance of specific cases, make a nice graph and add it as a picture. The user will not be making these graphs so they don't need to have to carry its dependency.

YAML

Like the other referee I have reservations about putting experimental material in the quick start. The other problem is that it doesn't run for me. I get the following error: FileNotFoundError: [Errno 2] No such file or directory: '/home/fste075/anaconda3/envs/test_popsynth/lib/python3.9/site-packages/popsynth/data/pop.yml'

This is a probably a result of the fact that I have it all in conda, but a fair few users will have your code running in a specific conda env so it's worth thinking about.

BernouilliSelection

I got the following error: FileNotFoundError: [Errno 2] No such file or directory: '/home/fste075/anaconda3/envs/test_popsynth/lib/python3.9/site-packages/popsynth/data/pop.yml'

Other

a lot of your docstrings are incomplete.
Your documentatoin under popsynth is empty headings. This should reflect the docstrings in your code.
add comments with the science references in the code so other users know where your equations come from

xuanxu commented 3 years ago

@grburgess did you have the oportunity to address the reviewers' comments?

grburgess commented 3 years ago

@xuanxu hi, yes, I've been finishing changes to the code. It's basically ready except that RTD is hanging at the moment.

grburgess commented 3 years ago

@HeloiseS and @warrickball I have worked a bit on the documentation to address you concerns. There were no issues opened so I will try to make sure and address every thing here:

I've added more in-depth explanation of the core of the procedure:

https://popsynth.readthedocs.io/en/latest/notebooks/distributions.html

This discusses the mathematics behind things. I could also add this to the paper, but from what I've seen in other JOSS papers, this might be a little too much... so I leave this up to the reviewers and editors.

I've added a realistic example for GRBS:

https://popsynth.readthedocs.io/en/latest/notebooks/examples.html#Short-GRBS

and there will be a blazar example by the end of the week.

Indeed, there are users that are developing some very different use cases for popsynth would be impossible to fully document as the abstraction of the framework opens up a lot of different possibilities. Some of these are currently being developed for publications in process, but I've asked those users to link in more examples when their publications are accepted.

The YAML functionality is no longer experimental. I've tried to add the test case data (pop.yml) so that it for sure is in the release, but I do not use conda and can't test that.

Luminosity functions are probability distributions. In popsynth they can be direct (e.g. sampling from a power law) or the luminosity function can be an emergent property of auxiliary samplers. This is documented here: https://popsynth.readthedocs.io/en/latest/notebooks/distributions.html#Creating-a-simple-population-synth and here: https://popsynth.readthedocs.io/en/latest/notebooks/aux.html#Derived-Luminosity-sampler

I have now documented that the 3D plots and graphs are available when their required packages are installed and configured. Unfortunately, these softwares are not the cleanest things to install. Nevertheless, these are only eye candy. As the docs are built live on each commit, their functionality is tested via continuous integration.

The API was failing because the numpy version with type hinting was not installed by default and I have now fixed this. I have tried to update all the doc strings, but there are still places where they are missing. I will fill these out with time, but the main functionality has both doc strings and type hinting. Also, for some reason the inits are now not spitting out their doctoring with sphinx. I will try to clean this up as well.

But why? I need a solid reason to add complexity. Also I don't understand the justification for the currently implemented Auxiliary Samplers: I do understand the ViewingAngle, >but what's the science behind having a log-normal or a log10 Auxiliary Sampler?

These features are used so that one can simulate very complex. For example, in one project relying on popsynth, people are simulating blazars from a luminosity and spatial distribution, as well as sampling their continuum and line spectra. All of these are random variables. Some quantities are sampled from log normal distributions. Rather than rewrite these every time, I have included them in the package.

Also I am concerned that I don't see any validation: How do I know these give me realistic simulations? I liked your graphs showing the "selected" and "non-selected" objects - I'd like to see how that compares to other theoretical predictions. Say the feed the rate of supernovae from stellar evolution codes into popsynth and see if you roughly retrieve the number of observed supernovae (given a volume).

As the framework is generic, one would have to specify a theoretical problem to verify. As far as simple verification of the sampling, this is simply using Monte Carlo to integrate the equation now in the documentation. If you can provide an example, I can try and simulate that problem.

Here are a couple of examples: What is a SFR-like redshift? What is a Schecter luminosity function? What populations are we synthesysing? Stars? Transients? Observations? All three?

The SFR like is now explained in the documentation. Here is the some info on a schechter-luminosity function: https://ui.adsabs.harvard.edu/abs/1976ApJ...203..297S/abstract

The included distributions are just things that others and myself commonly have used. However, the reason for providing the ability to build your own outside of the repo, i.e., locally and on the fly in a notebook, is so that one can easily extend/ add to the framework as needed. The objects that are simulated can be anything.

I am updating the refs in the paper. Let me know if there are remaining issues.

warrickball commented 3 years ago

Hi @grburgess,

First, thanks for the core concept page: it helps! I'm still not clear about the distinction between objects that always shine versus transients. The short GRB example makes the transient case clearer but what if I wanted to survey something like stars of a particular spectral type? My guess is that this is related to the definition of "intensity" as being some sort of time average and the only distinction between "transient" and "non-transient" is that the non-transient intensity is on average much greater. But it isn't clear to me how one might set this up in popsynth.

For one specific thing, how do I specify the duration Δt of the survey?

Perhaps I'm mistakenly interpreting the populations of the transient events as progenitors of those events, so that e.g. some GRB progenitors don't become GRB during the survey and we don't detect them. But perhaps the point is that popsynth doesn't care about the actual rate of events, or at least that's somehow incorporated in the luminosity distribution and we have to specify something that makes sense according to a (maybe probabilistic) model.

I personally don't find the "fire fly" concept in the quick start has helped me understand any better. Do they light up continuously? Do they repeat? I find the astrophysical examples somewhat more relatable.

There are still gaps in the API which I think would help a lot in terms of making the distributions make sense. Some are simple fixes (e.g. SFRDistribution could link to the relevant article) but many could simply be fleshed out more with the relevant formulae.

To create a concrete example that I hoped would cement some of my misunderstandings, I thought about making a flux-limited sample of stars with luminosities L=M³, where M is the mass drawn from some supposed initial mass function (I ended up using a lognormal). The aim is to demonstrate that the surveyed sample will be biased towards higher masses than the underlying sample. I spent about an hour on this but found myself uncertain how to proceed. I think I need to use a custom derived luminosity sampler, in which the underlying distribution would be the mass distribution, so I got as far as this script, largely copied from the docs:

#!/usr/bin/env python3

import numpy as np
import matplotlib.pyplot as pl
import popsynth

class LuminositySampler(popsynth.DerivedLumAuxSampler):
    _auxiliary_sampler_name = "LuminositySampler"
    mu = popsynth.auxiliary_sampler.AuxiliaryParameter(default=0)
    tau = popsynth.auxiliary_sampler.AuxiliaryParameter(default=1, vmin=0)

    def __init__(self, mu=0.0, tau=1.0, sigma=1):
        # this time set observed=True
        super(LuminositySampler, self).__init__("lumsample", uses_distance=False)

    def true_sampler(self, size):
        # draw a random number
        tmp = np.random.lognormal(self.mu, self.tau, size=size)
        self._true_values = tmp

    def compute_luminosity(self):
        # compute the luminosity
        return 10**(3*np.log10(self._true_values))

lum = LuminositySampler()

pop_gen = popsynth.populations.SphericalPopulation(1, r_max=10)
pop_gen.add_observed_quantity(lum)
flux_selector = popsynth.HardFluxSelection()
flux_selector.boundary = 1e-2
pop_gen.set_flux_selection(flux_selector)
pop = pop_gen.draw_survey(flux_sigma=0.1)

# pop.display_fluxes(obs_color='m', true_color='y', s=15)
# pl.show()

pl.hist(pop.luminosities_latent**(1/3), bins=np.arange(0.2,20.0,0.2), label='all');
pl.hist(pop.luminosities_latent[pop.selection]**(1/3), bins=np.arange(0.2,20.0,0.2), label='selected');
pl.xlabel('M')
pl.legend()
pl.show()

The fact that I've managed to get what looks like the right result (a flux-limited survey is biased towards more massive stars) means the docs are in a better state than my first impression suggested! But I don't pretend to understand the values (e.g. the flux boundary value) nor the relationship between the samplers, since I've simply copied the docs. I'm also not sure if there's a way to get to the underlying mass distribution instead of working it out from the transformation L=M³.

I still think the API docs need the most work. I was mostly exploring what attributes and functions are available by hitting tab for autocomplete options in IPython. I was also querying docstrings using the ? magic, which was often unhelpful. It would also be helpful to implement more __repr__ and/or __str__ functions, which would make it easier to see what attributes might not have been set, or quickly find out what default values they've taken. I've only used very simple distributions: the more complicated ones are more difficult because it's hard to see exactly what they do.

To finish with some concrete actions:

It still isn't clear how we relate the luminosities of transient versus non-transient things, and how the transient rate ("intensity"? "luminosity"?) incorporates the survey duration. The docs still need to explain this better upfront. It seems that to you there is no difference but to me there still is. (Consider: what about things that are intermittent and not strictly regular, like dwarf novae?)
I think the examples would benefit from a second case, aside from the GRB example, of something that isn't transient.
There are some heading issues where different headings from the same page are being listed in the contents sidebar. Not a big issue but a bit confusing for navigation.

I'll re-read the paper soon.

grburgess commented 3 years ago

Hi @grburgess,

First, thanks for the core concept page: it helps! I'm still not clear about the distinction between objects that always shine versus transients. The short GRB example makes the transient case clearer but what if I wanted to survey something like stars of a particular spectral type? My guess is that this is related to the definition of "intensity" as being some sort of time average and the only distinction between "transient" and "non-transient" is that the non-transient intensity is on average much greater. But it isn't clear to me how one might set this up in popsynth.

For the included cosmological distributions (though this could be modified externally), there is a constructor argument "as_rate" I haven't documented this well in the RTD page. But the difference here is on the dV/dz Jacobean that you need to account for the change in time on the cosmological interval when dealing with transients. If you aren't using cosmological distributions, then it depends on how you are define your volume elements.

https://github.com/grburgess/popsynth/blob/master/popsynth/distributions/cosmological_distribution.py#L11

https://github.com/grburgess/popsynth/blob/master/popsynth/distributions/cosmological_distribution.py#L50

This is a little bit involved and does depend on what you are modeling.

Also, the definition of intensity is just being used from the more general form of Poisson where we are talking about counting in an N-dim space: https://en.wikipedia.org/wiki/Poisson_point_process

You can imagine that each object's process is a Poisson point process in its space. I just don't want to go too much into that here as it can be found in the references.

For one specific thing, how do I specify the duration Δt of the survey?

It factors into the normalization of your spatial distribution. One would have to work this out for whatever case you are handling.

Perhaps I'm mistakenly interpreting the populations of the transient events as progenitors of those events, so that e.g. some GRB progenitors don't become GRB during the survey and we don't detect them. But perhaps the point is that popsynth doesn't care about the actual rate of events, or at least that's somehow incorporated in the luminosity distribution and we have to specify something that makes sense according to a (maybe probabilistic) model.

Yeah, there is no distinction... it really depends on how you define your volume elements.

I personally don't find the "fire fly" concept in the quick start has helped me understand any better. Do they light up continuously? Do they repeat? I find the astrophysical examples somewhat more relatable.

I can remove this.

There are still gaps in the API which I think would help a lot in terms of making the distributions make sense. Some are simple fixes (e.g. SFRDistribution could link to the relevant article) but many could simply be fleshed out more with the relevant formulae.

It is in the doc string but sphinx- auto doc is not working.

https://github.com/grburgess/popsynth/blob/master/popsynth/distributions/cosmological_distribution.py#L69

I think the better idea is to rename this to ColeSFR.. but this would break a lot of people's code at the moment. I think I will do that on a version bump.

To create a concrete example that I hoped would cement some of my misunderstandings, I thought about making a flux-limited sample of stars with luminosities L=M³, where M is the mass drawn from some supposed initial mass function (I ended up using a lognormal). The aim is to demonstrate that the surveyed sample will be biased towards higher masses than the underlying sample. I spent about an hour on this but found myself uncertain how to proceed. I think I need to use a custom derived luminosity sampler, in which the underlying distribution would be the mass distribution, so I got as far as this script, largely copied from the docs:
#!/usr/bin/env python3

import numpy as np
import matplotlib.pyplot as pl
import popsynth

class LuminositySampler(popsynth.DerivedLumAuxSampler):
    _auxiliary_sampler_name = "LuminositySampler"
    mu = popsynth.auxiliary_sampler.AuxiliaryParameter(default=0)
    tau = popsynth.auxiliary_sampler.AuxiliaryParameter(default=1, vmin=0)

    def __init__(self, mu=0.0, tau=1.0, sigma=1):
        # this time set observed=True
        super(LuminositySampler, self).__init__("lumsample", uses_distance=False)

    def true_sampler(self, size):
        # draw a random number
        tmp = np.random.lognormal(self.mu, self.tau, size=size)
        self._true_values = tmp

    def compute_luminosity(self):
        # compute the luminosity
        return 10**(3*np.log10(self._true_values))

lum = LuminositySampler()

pop_gen = popsynth.populations.SphericalPopulation(1, r_max=10)
pop_gen.add_observed_quantity(lum)
flux_selector = popsynth.HardFluxSelection()
flux_selector.boundary = 1e-2
pop_gen.set_flux_selection(flux_selector)
pop = pop_gen.draw_survey(flux_sigma=0.1)

# pop.display_fluxes(obs_color='m', true_color='y', s=15)
# pl.show()

pl.hist(pop.luminosities_latent**(1/3), bins=np.arange(0.2,20.0,0.2), label='all');
pl.hist(pop.luminosities_latent[pop.selection]**(1/3), bins=np.arange(0.2,20.0,0.2), label='selected');
pl.xlabel('M')
pl.legend()
pl.show()
The fact that I've managed to get what looks like the right result (a flux-limited survey is biased towards more massive stars) means the docs are in a better state than my first impression suggested! But I don't pretend to understand the values (e.g. the flux boundary value) nor the relationship between the samplers, since I've simply copied the docs. I'm also not sure if there's a way to get to the underlying mass distribution instead of working it out from the transformation L=M³.

This is very cool! Perhaps you would like to add this to the examples? More examples of different things could help people think of how to use this. The boundary is for the value of flux in erg/s/cm2. I need to doc this. You could also do this:


mport numpy as np
import matplotlib.pyplot as pl
import popsynth

# create a sampler for mass
mass = popsynth.LogNormalAuxSampler(name="mass", observed = False)

class LuminositySampler(popsynth.DerivedLumAuxSampler):
    _auxiliary_sampler_name = "LuminositySampler"

    def __init__(self, mu=0.0, tau=1.0, sigma=1):
        # this time set observed=True
        super(LuminositySampler, self).__init__("lumsample", uses_distance=False)

    def true_sampler(self, size):

        mass = self._secondary_samplers["mass"].true_values

        self._true_values = mass

    def compute_luminosity(self):
        # compute the luminosity
        return 10**(3*np.log10(self._true_values))

lum = LuminositySampler()

lum.set_secondary_sampler(mass)

pop_gen = popsynth.populations.SphericalPopulation(1, r_max=10)
pop_gen.add_observed_quantity(lum)
flux_selector = popsynth.HardFluxSelection()
flux_selector.boundary = 1e-2
pop_gen.set_flux_selection(flux_selector)
pop = pop_gen.draw_survey(flux_sigma=0.1)

# pop.display_fluxes(obs_color='m', true_color='y', s=15)
# pl.show()

fig, ax = plt.subplots()

ax.hist(pop.mass, bins=np.arange(0.2,20.0,0.2), label='all');
ax.hist(pop.mass[pop.selection], bins=np.arange(0.2,20.0,0.2), label='selected');
ax.set_xlabel('M')
ax.legend()

pop.display_fluxes();

fig, ax = plt.subplots()

ax.hist(pop.luminosities_latent, bins=np.arange(0.2,20.0,0.2), label='all');
ax.hist(pop.luminosities_latent[pop.selection], bins=np.arange(0.2,20.0,0.2), label='selected');
ax.set_xlabel('L')
ax.legend()

I still think the API docs need the most work. I was mostly exploring what attributes and functions are available by hitting tab for autocomplete options in IPython. I was also querying docstrings using the ? magic, which was often unhelpful. It would also be helpful to implement more __repr__ and/or __str__ functions, which would make it easier to see what attributes might not have been set, or quickly find out what default values they've taken. I've only used very simple distributions: the more complicated ones are more difficult because it's hard to see exactly what they do.

The ? should work on most of the constructors. But as I've said... autodoc is being a pain. I will work to fill out the docs strings more deeply in the code. all the distributions should have a display method which explains the math. But I've gotten behind on keeping up with this.

--

To finish with some concrete actions:

It still isn't clear how we relate the luminosities of transient versus non-transient things, and how the transient rate ("intensity"? "luminosity"?) incorporates the survey duration. The docs still need to explain this better upfront. It seems that to you there is no difference but to me there still is. (Consider: what about things that are intermittent and not strictly regular, like dwarf novae?)

I will doc the as_rate function better. But in general, it would be upon the user to write a correct spatial distribution. E.g., there are people using this to model flaring blazars, but these flare for much longer than GRBs. We have also used this to simulate static sources.

I think the examples would benefit from a second case, aside from the GRB example, of something that isn't transient.

I will add the blazar example soon. But again, your example is very nice.

There are some heading issues where different headings from the same page are being listed in the contents sidebar. Not a big issue but a bit confusing for navigation.

Ah... I need to separate this out.

I'll re-read the paper soon.

warrickball commented 3 years ago

Okay, I think I've (very slowly!) come around to how the transient versus non-transient thing works. So I think the missing piece of the documentation is a description (or link to a description of) how the relevant normalization constant should be calculated and which parameter it corresponds to in popsynth. At the moment I don't think this is mentioned at all in the docs.

You're more than welcome to use my example however you like. It'd benefit from being fleshed out with some vaguely reasonable numbers for the fluxes and distances and whatever other parameters. An interesting version—but not one I expect for this review!—might be to demonstrate how the (main sequence) stars we see by eye in the night sky are a flux-limited sample and therefore biased to higher masses. That could even be compared to a list of actual main sequence stars with V<4 or something (to keep the number small). I don't know how to construct that query but it's probably not too hard with astroquery.

Finally, regarding the formulae for the various functions, it's all well using the display method (though I don't see it implemented everywhere) but don't presume everyone uses the same workflow and tools as you. I don't routinely use notebooks and in my IPython console the maths just displays as <IPython.core.display.Math object>. I'd recommend just writing the formulae in the docstrings, much like e.g. SciPy. It's not pretty in a console either but at least I can use my mental LaTeX parser, which isn't too bad after all these years!

grburgess commented 3 years ago

Okay, I think I've (very slowly!) come around to how the transient versus non-transient thing works. So I think the missing piece of the documentation is a description (or link to a description of) how the relevant normalization constant should be calculated and which parameter it corresponds to in popsynth. At the moment I don't think this is mentioned at all in the docs.

Good point! Units in many places would likely make this clearer. It's just that they can change depending on the application. I will think and try to write this up. But, yes, I will include this in the cosmological docs (I'm not sure how this works in fact space...)

You're more than welcome to use my example however you like. It'd benefit from being fleshed out with some vaguely reasonable numbers for the fluxes and distances and whatever other parameters. An interesting version—but not one I expect for this review!—might be to demonstrate how the (main sequence) stars we see by eye in the night sky are a flux-limited sample and therefore biased to higher masses. That could even be compared to a list of actual main sequence stars with V<4 or something (to keep the number small). I don't know how to construct that query but it's probably not too hard with astroquery.

Thanks! Would you like to do a PR for the credit? I can always modify in the end.

Finally, regarding the formulae for the various functions, it's all well using the display method (though I don't see it implemented everywhere) but don't presume everyone uses the same workflow and tools as you. I don't routinely use notebooks and in my IPython console the maths just displays as . I'd recommend just writing the formulae in the docstrings, much like e.g. SciPy. It's not pretty in a console either but at least I can use my mental LaTeX parser, which isn't too bad after all these years!

Good point... I'm so used to using the notebook I forget. I will go through and start making beefier repr methods. There is a lot of code to document.

grburgess commented 3 years ago

@warrickball Ok, Thanks to @cescalara , there is now another example in the docs as well as more complete doc strings. She has also now been added as a coauthor.

HeloiseS commented 3 years ago

Hello!

Sorry for the delay. So much progress we're nearly there.

Sidenote: I am having issues with the website slowing my browser down significantly and freezing it - I think it might be the fancy 3D plots. I would recommend making the interactive plot an optional thing in a jupyter notebook somewhere else and instead having a static 3D plot maybe viewed from different angles to illustrate the point -> Or just warn people to firefox, instead of chrome if things get slow.

Comments

The fireflies can stay but the idea of making things accessible to PhD students isn't so much about making them cute, it is about not assuming knowledge and giving them the tools to learn more if some explanations are beyond the scope of your tutorial. For example, why is the luminosity function a power law - can we have a reference for the Pareto luminosity function? Put yourself in the shoes of someone who is new to the field - don't hesitate to cite classical references or even a textbook. Someone will be thankful for it.
I really like that the level of detail you've got in most of your docstrings to describe the Luminosity functions - with the maths, and sometimes even the appropriate reference. I do have a question: how are people supposed to know which parameters for the distributions are physical and what isn't. Do they need to check the references or would it be easy for you to show or write somewhere the range of values that make sense?
I love that you have an example of a transient population and an example of a non transient population - thanks to the other referee for suggesting that and to you for implementing it, it really makes all the difference. It's really cool to be able to see the underlying population and the observed one on top of each other.
The docstrings need to be completed - remember this is the first point of contact for most coders: myclass.doc or myclass()? are essential. I don't want to have to go back to the tutorials evertime I forget what a parameter does. A bit more information would be nice on some of them, if you can ensure that they have, 3 things:
- A description
- The expected type and or format if relevant
- Whether it is optional and if it is, the Default value
I like the example of the selection bias towards massive stars! Could you add that to the docs I don't see it under "Examples"?
Add community guidelines to the README on the github not just to the documentation

grburgess commented 3 years ago

Hi,

Sorry about the plots slowing things down. I have reduced the number of objects to try and make the plotting manageable.

Regarding the first two comments. Pareto is just a normalized power law. There are references to an economic principle, but the mathematical form should be sufficient as I would not want someone to think that it is in any way related to economics. As far as parameter ranges, this being a generic framework, there are no defined parameter ranges other than those that would make the functions mathematically incorrect (negative amplitudes for example). This is also why references really don't make much sense in this context as it is meant to be a frame work to build populations. The distributions are just normalized mathematical functions. One could place values of parameters that correspond to theoretical or observed values. Moreover, the description in the documentation for constructing custom distributions more important as other users have generally made their own distribution functions. As there is a lot of freedom, the only thing that can be documented is the construction.

The docstrings are now filled out.

I have added the massive stars example to the documentation.

I have added the community guidelines to the README.

@warrickball I have also now added text repr so that quick information can be displayed in ipython

HeloiseS commented 3 years ago

okay cool just one question: is there a physical motivation for using this pareto function? Or is it a case of "we don't know what the luminosity function is doing for our populations so a powerlaw is a good guess"?

grburgess commented 3 years ago

In general, luminosity functions are empirical descriptions that are modeling several underlying processes that manifest as a single observed distribution. For example, the GRB luminosity function is likely driven by progenitor mass / explosion energy, radiative process, and jet viewing angle. Observationally, this manifests as a broken power law. Thus, unless one is modeling all the physical processes, the most straight-forward modeling approach is to use some empirical function like a broken power law, Schecter function etc. On the simulation side, one can go any route they choose. If you want to model the luminosity distribution as an empirical function, then one does that. In popsynth the function of the DerivedLumAuxSampler is that the value of luminosity per object is a manifest property of underlying variables.

Nevertheless, there are several cases fo power laws being used to model luminosity functions. E.g., the radio luminosity function of of radio quiet AGN: https://iopscience.iop.org/article/10.1088/0004-637X/740/1/20 .

The inclusion of the Pareto distribution is simply because it is used a lot by those using this code, but as the software is provided as a framework to be extended, there is no preference for this distribution.

HeloiseS commented 3 years ago

okay cool! Thanks for addressing all my comments - all my boxes are ticked and on my side of things we are done!

grburgess commented 3 years ago

@whedon generate pdf

whedon commented 3 years ago

PDF failed to compile for issue #3257 with the following error:

 /app/vendor/ruby-2.6.6/lib/ruby/2.6.0/psych.rb:456:in `parse': (9c248ef6a695b70d00dd7037/paper.md): found a tab character that violate intendation while scanning a plain scalar at line 12 column 11 (Psych::SyntaxError)
    from /app/vendor/ruby-2.6.6/lib/ruby/2.6.0/psych.rb:456:in `parse_stream'
    from /app/vendor/ruby-2.6.6/lib/ruby/2.6.0/psych.rb:390:in `parse'
    from /app/vendor/ruby-2.6.6/lib/ruby/2.6.0/psych.rb:277:in `load'
    from /app/vendor/ruby-2.6.6/lib/ruby/2.6.0/psych.rb:578:in `block in load_file'
    from /app/vendor/ruby-2.6.6/lib/ruby/2.6.0/psych.rb:577:in `open'
    from /app/vendor/ruby-2.6.6/lib/ruby/2.6.0/psych.rb:577:in `load_file'
    from /app/vendor/bundle/ruby/2.6.0/bundler/gems/whedon-b63fc70cc085/lib/whedon.rb:127:in `load_yaml'
    from /app/vendor/bundle/ruby/2.6.0/bundler/gems/whedon-b63fc70cc085/lib/whedon.rb:87:in `initialize'
    from /app/vendor/bundle/ruby/2.6.0/bundler/gems/whedon-b63fc70cc085/lib/whedon/processor.rb:38:in `new'
    from /app/vendor/bundle/ruby/2.6.0/bundler/gems/whedon-b63fc70cc085/lib/whedon/processor.rb:38:in `set_paper'
    from /app/vendor/bundle/ruby/2.6.0/bundler/gems/whedon-b63fc70cc085/bin/whedon:58:in `prepare'
    from /app/vendor/bundle/ruby/2.6.0/gems/thor-0.20.3/lib/thor/command.rb:27:in `run'
    from /app/vendor/bundle/ruby/2.6.0/gems/thor-0.20.3/lib/thor/invocation.rb:126:in `invoke_command'
    from /app/vendor/bundle/ruby/2.6.0/gems/thor-0.20.3/lib/thor.rb:387:in `dispatch'
    from /app/vendor/bundle/ruby/2.6.0/gems/thor-0.20.3/lib/thor/base.rb:466:in `start'
    from /app/vendor/bundle/ruby/2.6.0/bundler/gems/whedon-b63fc70cc085/bin/whedon:131:in `<top (required)>'
    from /app/vendor/bundle/ruby/2.6.0/bin/whedon:23:in `load'
    from /app/vendor/bundle/ruby/2.6.0/bin/whedon:23:in `<main>'

grburgess commented 3 years ago

@whedon generate pdf

whedon commented 3 years ago

PDF failed to compile for issue #3257 with the following error:

 /app/vendor/ruby-2.6.6/lib/ruby/2.6.0/psych.rb:456:in `parse': (45cfb8b3f4d8e6771d4bdf81/paper.md): found a tab character that violate intendation while scanning a plain scalar at line 12 column 11 (Psych::SyntaxError)
    from /app/vendor/ruby-2.6.6/lib/ruby/2.6.0/psych.rb:456:in `parse_stream'
    from /app/vendor/ruby-2.6.6/lib/ruby/2.6.0/psych.rb:390:in `parse'
    from /app/vendor/ruby-2.6.6/lib/ruby/2.6.0/psych.rb:277:in `load'
    from /app/vendor/ruby-2.6.6/lib/ruby/2.6.0/psych.rb:578:in `block in load_file'
    from /app/vendor/ruby-2.6.6/lib/ruby/2.6.0/psych.rb:577:in `open'
    from /app/vendor/ruby-2.6.6/lib/ruby/2.6.0/psych.rb:577:in `load_file'
    from /app/vendor/bundle/ruby/2.6.0/bundler/gems/whedon-b63fc70cc085/lib/whedon.rb:127:in `load_yaml'
    from /app/vendor/bundle/ruby/2.6.0/bundler/gems/whedon-b63fc70cc085/lib/whedon.rb:87:in `initialize'
    from /app/vendor/bundle/ruby/2.6.0/bundler/gems/whedon-b63fc70cc085/lib/whedon/processor.rb:38:in `new'
    from /app/vendor/bundle/ruby/2.6.0/bundler/gems/whedon-b63fc70cc085/lib/whedon/processor.rb:38:in `set_paper'
    from /app/vendor/bundle/ruby/2.6.0/bundler/gems/whedon-b63fc70cc085/bin/whedon:58:in `prepare'
    from /app/vendor/bundle/ruby/2.6.0/gems/thor-0.20.3/lib/thor/command.rb:27:in `run'
    from /app/vendor/bundle/ruby/2.6.0/gems/thor-0.20.3/lib/thor/invocation.rb:126:in `invoke_command'
    from /app/vendor/bundle/ruby/2.6.0/gems/thor-0.20.3/lib/thor.rb:387:in `dispatch'
    from /app/vendor/bundle/ruby/2.6.0/gems/thor-0.20.3/lib/thor/base.rb:466:in `start'
    from /app/vendor/bundle/ruby/2.6.0/bundler/gems/whedon-b63fc70cc085/bin/whedon:131:in `<top (required)>'
    from /app/vendor/bundle/ruby/2.6.0/bin/whedon:23:in `load'
    from /app/vendor/bundle/ruby/2.6.0/bin/whedon:23:in `<main>'

grburgess commented 3 years ago

@whedon generate pdf

whedon commented 3 years ago

PDF failed to compile for issue #3257 with the following error:

 /app/vendor/bundle/ruby/2.6.0/bundler/gems/whedon-b63fc70cc085/lib/whedon/author.rb:72:in `block in build_affiliation_string': Problem with affiliations for Francesca Capel, perhaps the affiliations index need quoting? (RuntimeError)
    from /app/vendor/bundle/ruby/2.6.0/bundler/gems/whedon-b63fc70cc085/lib/whedon/author.rb:71:in `each'
    from /app/vendor/bundle/ruby/2.6.0/bundler/gems/whedon-b63fc70cc085/lib/whedon/author.rb:71:in `build_affiliation_string'
    from /app/vendor/bundle/ruby/2.6.0/bundler/gems/whedon-b63fc70cc085/lib/whedon/author.rb:17:in `initialize'
    from /app/vendor/bundle/ruby/2.6.0/bundler/gems/whedon-b63fc70cc085/lib/whedon.rb:205:in `new'
    from /app/vendor/bundle/ruby/2.6.0/bundler/gems/whedon-b63fc70cc085/lib/whedon.rb:205:in `block in parse_authors'
    from /app/vendor/bundle/ruby/2.6.0/bundler/gems/whedon-b63fc70cc085/lib/whedon.rb:202:in `each'
    from /app/vendor/bundle/ruby/2.6.0/bundler/gems/whedon-b63fc70cc085/lib/whedon.rb:202:in `parse_authors'
    from /app/vendor/bundle/ruby/2.6.0/bundler/gems/whedon-b63fc70cc085/lib/whedon.rb:93:in `initialize'
    from /app/vendor/bundle/ruby/2.6.0/bundler/gems/whedon-b63fc70cc085/lib/whedon/processor.rb:38:in `new'
    from /app/vendor/bundle/ruby/2.6.0/bundler/gems/whedon-b63fc70cc085/lib/whedon/processor.rb:38:in `set_paper'
    from /app/vendor/bundle/ruby/2.6.0/bundler/gems/whedon-b63fc70cc085/bin/whedon:58:in `prepare'
    from /app/vendor/bundle/ruby/2.6.0/gems/thor-0.20.3/lib/thor/command.rb:27:in `run'
    from /app/vendor/bundle/ruby/2.6.0/gems/thor-0.20.3/lib/thor/invocation.rb:126:in `invoke_command'
    from /app/vendor/bundle/ruby/2.6.0/gems/thor-0.20.3/lib/thor.rb:387:in `dispatch'
    from /app/vendor/bundle/ruby/2.6.0/gems/thor-0.20.3/lib/thor/base.rb:466:in `start'
    from /app/vendor/bundle/ruby/2.6.0/bundler/gems/whedon-b63fc70cc085/bin/whedon:131:in `<top (required)>'
    from /app/vendor/bundle/ruby/2.6.0/bin/whedon:23:in `load'
    from /app/vendor/bundle/ruby/2.6.0/bin/whedon:23:in `<main>'

grburgess commented 3 years ago

@whedon generate pdf

whedon commented 3 years ago

:point_right::page_facing_up: Download article proof :page_facing_up: View article proof on GitHub :page_facing_up: :point_left:

warrickball commented 3 years ago

I just had a look at the latest PDF draft and have a two very minor comments.

First, the opening sentence,

Simulating a population survey of fluxes and redshifts (distances) from an astrophysical population is a routine task.

reads slightly strangely to me, I think because I don't quite know what a "population survey ... from an astrophysical population" is. It makes more sense to me if the first occurrence of "population" is deleted.

Second, something has gone wrong with this reference:

al., A. A. et. (2021). SkyPy: A package for modelling the universe. In GitHub repository. GitHub.https://github.com/skypyproject/skypy

The only other thing I'd like to check again is the state of the automated tests. I can see that the CI appears to be working but I just want to check that everything makes sense for potential contributors. Since @HeloiseS is happy, I'll try to do this today but I might have to work on some more urgent things that'll push this back to late next week.

grburgess commented 3 years ago

@warrickball yes, there are too many populations in that sentence. I removed the first one.

I see that SkyPy has a joss-paper under review, so I grabbed the first author from there.

grburgess commented 3 years ago

@whedon generate pdf

whedon commented 3 years ago

:point_right::page_facing_up: Download article proof :page_facing_up: View article proof on GitHub :page_facing_up: :point_left:

warrickball commented 3 years ago

I updated a previous checkout and was able to run the tests successfully by running pytest in the popsynth folder. It now looks like this isn't documented but should be, probably in a section near Contributing.

I also noticed that test_schecter (sic) takes a long time. I didn't record any timing information but it was many minutes. The CI report for the Ubuntu build of the most recent commit shows the tests taking 16 minutes. You might be happy with that but it seems surprisingly long and probably dominated by that one test. In my experience, if tests take a long time, people won't run them or won't run them often enough to catch bugs.

The tests do run, though, so my checklist is complete and I'm happy for this to be accepted in JOSS.

grburgess commented 3 years ago

@warrickball yes, this is the hypothesis package that tries to inject parameters until it finds a problem. This is the first and only package I have used it on because of the speed issues. It does help to find issues with parameter inputs... but can lead to very long running tests.

grburgess commented 3 years ago

@xuanxu what are the next steps?

xuanxu commented 3 years ago

Thanks @HeloiseS & @warrickball for your reviews!

@grburgess I think I spotted a typo in the paper' summary

grburgess commented 3 years ago

@xuanxu merged. Thanks!

xuanxu commented 3 years ago

@whedon generate pdf

whedon commented 3 years ago

:point_right::page_facing_up: Download article proof :page_facing_up: View article proof on GitHub :page_facing_up: :point_left:

xuanxu commented 3 years ago

OK @grburgess, everything looks good, here are the next steps:

Please release a new tagged version from the current master branch so it includes all the changes made during the review process
Then archive that latest release in Zenodo or a similar service
Check the Zenodo deposit has the correct metadata: title and authors names should match the paper; you may also add authors' ORCIDs.

Once you do that please report here the version number and archive DOI

grburgess commented 3 years ago

@xuanxu I usually do a tag which then releases to pypi. should I just tar the pypi release and put that in zenodo ?

xuanxu commented 3 years ago

@grburgess yes, you can upload the .zip or the .tar file generated by GH when the tag (or release) is created.

files

grburgess commented 3 years ago

@xuanxu I have uploaded a new version to zenodo:

https://zenodo.org/record/5109570#.YPFhrC0Rpf0

grburgess commented 3 years ago

it is also on pypi: https://pypi.org/project/popsynth

xuanxu commented 3 years ago

@grburgess The zenodo link points to a snapshot of the 0.2.2 version. I think you need to create a new upload with the latest zip file. Also, please change the title to match the paper.