Closed whedon closed 3 years ago
Hello human, I'm @whedon, a robot that can help you with some common editorial tasks. @HeloiseS, @warrickball it looks like you're currently assigned to review this paper :tada:.
:warning: JOSS reduced service mode :warning:
Due to the challenges of the COVID-19 pandemic, JOSS is currently operating in a "reduced service mode". You can read more about what that means in our blog post.
:star: Important :star:
If you haven't already, you should seriously consider unsubscribing from GitHub notifications for this (https://github.com/openjournals/joss-reviews) repository. As a reviewer, you're probably currently watching this repository which means for GitHub's default behaviour you will receive notifications (emails) for all reviews 😿
To fix this do the following two things:
For a list of things I can do to help you, just type:
@whedon commands
For example, to regenerate the paper pdf after making changes in the paper's md or bib files, type:
@whedon generate pdf
Software report (experimental):
github.com/AlDanial/cloc v 1.88 T=0.43 s (280.9 files/s, 35792.3 lines/s)
-------------------------------------------------------------------------------
Language files blank comment code
-------------------------------------------------------------------------------
Python 58 2986 2240 5392
CSS 36 31 77 2590
Markdown 8 451 0 913
YAML 8 47 37 240
TeX 1 5 0 52
reStructuredText 6 40 83 28
JavaScript 1 1 8 26
HTML 2 0 25 20
-------------------------------------------------------------------------------
SUM: 120 3561 2470 9261
-------------------------------------------------------------------------------
Statistical information for the repository 'a418e67afb0e014c424e664e' was
gathered on 2021/05/06.
The following historical commit information, by author, was found:
Author Commits Insertions Deletions % of changes
Francesca Capel 8 395 212 2.15
J. Michael Burgess 198 8182 3644 41.96
grburgess 246 10861 4890 55.89
Below are the number of rows from each author that have survived and are still
intact in the current revision:
Author Rows Stability Age % in comments
Francesca Capel 284 71.9 1.9 6.69
J. Michael Burgess 10369 126.7 12.4 10.75
Reference check summary (note 'MISSING' DOIs are suggestions that need verification):
OK DOIs
- 10.1103/physrevd.100.103523 is OK
MISSING DOIs
- None
INVALID DOIs
- None
:point_right::page_facing_up: Download article proof :page_facing_up: View article proof on GitHub :page_facing_up: :point_left:
Hi,
I've just made a first pass at reviewing the repository. I'm not an expert on the scientific application but I'm at least an astronomer/astrophysicist and will comment on the repo from that perspective.
First, the code itself seems reasonable overall. I had no problem installing and running it, though the nice-looking 3D plots didn't work in a script or IPython console. (I don't personally use Jupyter notebooks much.) The code certainly appears to represent significant and useful work. As noted in the paper, other population synthesis codes tend to focus on particular kinds of astrophysical object, whereas popsynth
is clearly meant to be abstract enough to apply to many different types of object.
My principal concern is the documentation. For a start, it appears incomplete. The API docs linked from the left sidebar are empty and there are two links to the same page (API and popsynth). I also came across a FIXME
in the docstring of popsynth.ZPowerSphericalPopulation
, which it turns out is in docstrings of many of the spatial populations. The experimental YAML documentation in the Quick start also has a sentence
We can load this yaml file into a population synth like this:
which abruptly leads to the next section. (I wouldn't recommend documenting experimental features in the Quick start.)
The Quick start plunges into an abstract and jargon-laden example. So, though it works and the abstraction makes the code usefully versatile, it isn't at all obvious to me how it applies to a real-world problem, though perhaps this would be obvious to someone in the field. I think the docs would be better served by relating its parameters to a realistic problem. e.g. Could the population in the Quick start be thought of as AGN? Similarly, few of the distributions are familiar and I've only seen the equations in the docs produced by the population generator objects' display()
function (which, like the 3D plots, didn't work for me in scripts or IPython). It'd be good to have a reference for each equation but this would in effect be done by populating the distributions' docstrings.
On a related note, there are some technical details about what the code aims to do that I don't understand. Are we trying to create a population of events or of sources? I presumed the code was for sources but the text of the paper mentions gamma-ray bursts (GRBs) as examples. I would've thought GRBs are one-off events: once a GRB has been detected from a source, we wouldn't expect to observe another GRB from that source, would we? If I've understood that correctly, it then isn't obvious to me how popsynth
simulates a population of GRBs. Perhaps the "source" of a GRB is really its host galaxy? The animated spinning figure on the frontpage of the documentation also gives the impression of simulating a population of events.
I think this would be cleared up if there were some mathematical definitions for what exactly is meant by terms like population, luminosity function, etc. I don't expect these to be prominent: they can go in a glossary or something (though they should perhaps go in the paper too). But without them, it's difficult to know exactly what the code is intended to do.
It's a simple thing but for a non-expert like me, I'd find it easier to understand the docs if the various classes and functions were
internally linked (e.g. with ReST :py:class:...
directives).
To close this section, you may take my scientific remarks with a pinch of salt. I'm an not an expert and perhaps your target user knows enough that my comments are moot.
The requirements for testing aren't specified anywhere. I cloned the repo and tried pytest
in the directory above the tests
I found, which revealed that hypothesis
is required. With that installed I got an error
popsynth/test/test_selections.py:7: in <module>
class GalacticPlaceSelection(popsynth.SpatialSelection):
popsynth/test/test_selections.py:11: in GalacticPlaceSelection
b_limit = popsynth.SelectionParameter(vmin=0, vmax=90)
E AttributeError: module 'popsynth' has no attribute 'SelectionParameter'
and a bunch of warnings, mostly about "traits". The CI appears to be passing, though, so perhaps I've missed a dependency or run pytest
in the wrong place. My platform information (from pytest
) is:
platform linux -- Python 3.9.4, pytest-6.2.1, py-1.10.0, pluggy-0.13.1
rootdir: ..., configfile: setup.cfg, testpaths: popsynth/test
plugins: cov-2.10.1, remotedata-0.3.2, hypothesis-6.12.0
Finally, two small points. First, there are no "clear guidelines for third parties wishing to 1) Contribute to the software 2) Report issues or problems with the software 3) Seek support". All of this can be done through GitHub but the repo or docs should say so. (On a related note, the docs don't seem to link to the GitHub repo anywhere.)
Second, the SkyPy
and firesong
references in the paper render as
al., A. A. et. (2021). SkyPy: A package for modelling the universe. InGitHub repository. GitHub.https://github.com/skypyproject/skypy al., I. T. et. (2021). FIRst extragalactic simulation of neutrino and gamma-ray. InGitHub repository. GitHub.https://github.com/icecube/FIRESONG
suggesting an error in the author lists but these references can be replaced. firesong
has a JOSS paper and SkyPy
has a citation file that points to recommended DOIs and BibTeX entries on Zenodo.
Thank you! I will begin to implement these changes, but here are some quick responses:
My principal concern is the documentation. For a start, it appears incomplete. The API docs linked from the left sidebar are empty and there are two links to the same page (API and popsynth). I also came across a FIXME in the docstring of popsynth.ZPowerSphericalPopulation, which it turns out is in docstrings of many of the spatial populations.
Yes, I need to better doc all the included distributions. Many of these are things that I use and it is not meant to have an exhaustive list because users can easily add their own. If needed, I can add others, but others as well as myself have slowly been adding on things in other repos as they are needed.
The experimental YAML documentation in the Quick start also has a sentence We can load this yaml file into a population synth like this: which abruptly leads to the next section. (I wouldn't recommend documenting experimental features in the Quick start.)
Oops. This seems like a commit that got behind. Will fix.
The Quick start plunges into an abstract and jargon-laden example. So, though it works and the abstraction makes the code usefully versatile, it isn't at all obvious to me how it applies to a real-world problem, though perhaps this would be obvious to someone in the field. I think the docs would be better served by relating its parameters to a realistic problem. e.g. Could the population in the Quick start be thought of as AGN? Similarly, few of the distributions are familiar and I've only seen the equations in the docs produced by the population generator objects' display() function (which, like the 3D plots, didn't work for me in scripts or IPython). It'd be good to have a reference for each equation but this would in effect be done by populating the distributions' docstrings.
This is a good point. There were originally equations but looking at other JOSS papers I thought this was too much. I'll gladly add these back in. Sorry about the iPython part, I'm using the Ipython html display capabilities for this stuff, and it is really only meant to be eye-candy and double checking when examining the functions. I think you are right that a more concrete example or changing "population" to "AGN" or "GRB" would make the connection to the abstraction a little easier. I will also add some specific examples.
On a related note, there are some technical details about what the code aims to do that I don't understand. Are we trying to create a population of events or of sources? I presumed the code was for sources but the text of the paper mentions gamma-ray bursts (GRBs) as examples. I would've thought GRBs are one-off events: once a GRB has been detected from a source, we wouldn't expect to observe another GRB from that source, would we? If I've understood that correctly, it then isn't obvious to me how popsynth simulates a population of GRBs. Perhaps the "source" of a GRB is really its host galaxy? The animated spinning figure on the frontpage of the documentation also gives the impression of simulating a population of events.
Yes, this is confusing. The idea original idea was to create a "survey" or an observed population of fluxes sampled from their observation likelihoods. As the project grew and others were using it, the purpose became more abstract because the structure lets you easily generate linked/dependent quantities. I think adding the math behind the idea will clear some of this up. The code was originally intended to test hierarchal Bayesian models that fit a population of observations, but it has grown..
I think this would be cleared up if there were some mathematical definitions for what exactly is meant by terms like population, luminosity function, etc. I don't expect these to be prominent: they can go in a glossary or something (though they should perhaps go in the paper too). But without them, it's difficult to know exactly what the code is intended to do.
Agreed. I will add this back.
It's a simple thing but for a non-expert like me, I'd find it easier to understand the docs if the various classes and functions were internally linked (e.g. with ReST :py:class:... directives).
I agree, I was trying to use auto doc to generate everything in real time with GitHub actions, but this seems to be failing sporadically. I will try and implement your suggestion.
To close this section, you may take my scientific remarks with a pinch of salt. I'm an not an expert and perhaps your target user knows enough that my comments are moot.
Ah ok, seems there might be an issue in the setup.cfg. I may have beat the CI environment into doing this automatically so I will try and get that working locally. I'm actually not a fan of installing hypotheses if I do not need it... but I will add that to the reqs.
Finally, two small points. First, there are no "clear guidelines for third parties wishing to 1) Contribute to the software 2) Report issues or problems with the software 3) Seek support". All of this can be done through GitHub but the repo or docs should say so. (On a related note, the docs don't seem to link to the GitHub repo anywhere.)
Will fix
Second, the SkyPy and firesong references in the paper render as
Will fix this, I see the Firesong JOSS paper was just out today.
:wave: @HeloiseS, please update us on how your review is going (this is an automated reminder).
:wave: @warrickball, please update us on how your review is going (this is an automated reminder).
Hi !
just wanted to check in to say that I never got the notification for this (even on github) I should have sent an email earlier to check everything was working!
I'll be taking a look at this in the next few days but FYI I'm in the middle of starting fellowship applications so I'm doing my best, not just dragging my feet - looking forward to looking at this!
Cheers, Heloise
@whedon re-invite HeloiseS as reviewer
OK, the reviewer has been re-invited.
@heloises please accept the invite by clicking this link: https://github.com/openjournals/joss-reviews/invitations
Hi,
I think my comments will echo what the other referee said. First of all congratulations on your code-base! As far as I can see it's very clean and maintained to a high standard of testing.
My first and main comment is that after reading the paper and gone through the tutorials, I am still not sure what use cases I would apply this code to - I think that the level of abstraction and the heavy use of jargon (that sometimes has multiple meaning across different fields) is a big part of the problem.
What would help would be some real life examples: The other referee mentioned GRBs; my first thought was also transients, both galactic and extra galactic ones. On reading the paper I was imagining that I could give popsynth
transient rates (you do mention those at the beginning of the Procedure section) derived from stellar population synthesis codes, add observational biases, take into account my telescope's magnitude limits in specific bands, include an approximation of the milky way disc extinction, add a telescope latitude, a program time (say 3 nights) and BAM! get an estimate of how many hits I can expect.
Having looked at popsynth
, I have no idea where to start to make realistic observations/simulations. I would encourage you to make sure your tutorials are clear even to Master's students/ early PhD students. If your code is open source and in JOSS it is intended for the whole community and at the moment the learning curve is way too high for your really nice code to be used by a large population.
Are there use cases out there already demonstrating how this general and abstract API can be used to address specific astronomy problems? You should cite those in the paper :)
Finally I suggest you do a thorough proof-read of the documentation I came across a few typos.
What are the distributions? What do they do? I understand the concept of the spatial distribution, I am not sure what the luminosity function actually is and where it comes from? Is it source specific? Why does it differ from a basic inverse square law (I assume a majority of astro people know their inverse square law so that can be a good start to anchor your explanation in knowledge that most users will share). I think this goes back to adding more explanation so that a wide audience understands the science behind the code and also adding references as mentioned by reviewer 1.
Also I am concerned that I don't see any validation: How do I know these give me realistic simulations? I liked your graphs showing the "selected" and "non-selected" objects - I'd like to see how that compares to other theoretical predictions. Say the feed the rate of supernovae from stellar evolution codes into popsynth
and see if you roughly retrieve the number of observed supernovae (given a volume).
"This allows you to build up arbitrailiy complex dependencies between parameters which can lead to diverse populations."
=> But why? I need a solid reason to add complexity.
Also I don't understand the justification for the currently implemented Auxiliary Samplers: I do understand the ViewingAngle
, but what's the science behind having a log-normal or a log10 Auxiliary Sampler?
Jargon
Here are a couple of examples: What is a SFR-like redshift? What is a Schecter luminosity function? What populations are we synthesysing? Stars? Transients? Observations? All three?
None of the 3D graphs render for me.
I installed popsynth
in a fresh conda env and I was missing a few dependencies to run the jupyter notebooks
jupyterthemes
pygraphviz
That last one straight up failed to install. I am seeing this issue in their GitHub indicating it could be missing some development files.
That is a lot of complex dependency for flowchart and I don't know if it adds much. I would suggest reconsidering which dependencies are absolutely necessary for the good running of the code and removing the rest. If you want to illustrate class inheritance of specific cases, make a nice graph and add it as a picture. The user will not be making these graphs so they don't need to have to carry its dependency.
Like the other referee I have reservations about putting experimental material in the quick start. The other problem is that it doesn't run for me. I get the following error:
FileNotFoundError: [Errno 2] No such file or directory: '/home/fste075/anaconda3/envs/test_popsynth/lib/python3.9/site-packages/popsynth/data/pop.yml'
This is a probably a result of the fact that I have it all in conda, but a fair few users will have your code running in a specific conda env so it's worth thinking about.
I got the following error:
FileNotFoundError: [Errno 2] No such file or directory: '/home/fste075/anaconda3/envs/test_popsynth/lib/python3.9/site-packages/popsynth/data/pop.yml'
popsynth
is empty headings. This should reflect the docstrings in your code.@grburgess did you have the oportunity to address the reviewers' comments?
@xuanxu hi, yes, I've been finishing changes to the code. It's basically ready except that RTD is hanging at the moment.
@HeloiseS and @warrickball I have worked a bit on the documentation to address you concerns. There were no issues opened so I will try to make sure and address every thing here:
I've added more in-depth explanation of the core of the procedure:
https://popsynth.readthedocs.io/en/latest/notebooks/distributions.html
This discusses the mathematics behind things. I could also add this to the paper, but from what I've seen in other JOSS papers, this might be a little too much... so I leave this up to the reviewers and editors.
I've added a realistic example for GRBS:
https://popsynth.readthedocs.io/en/latest/notebooks/examples.html#Short-GRBS
and there will be a blazar example by the end of the week.
Indeed, there are users that are developing some very different use cases for popsynth would be impossible to fully document as the abstraction of the framework opens up a lot of different possibilities. Some of these are currently being developed for publications in process, but I've asked those users to link in more examples when their publications are accepted.
The YAML functionality is no longer experimental. I've tried to add the test case data (pop.yml) so that it for sure is in the release, but I do not use conda and can't test that.
Luminosity functions are probability distributions. In popsynth they can be direct (e.g. sampling from a power law) or the luminosity function can be an emergent property of auxiliary samplers. This is documented here: https://popsynth.readthedocs.io/en/latest/notebooks/distributions.html#Creating-a-simple-population-synth and here: https://popsynth.readthedocs.io/en/latest/notebooks/aux.html#Derived-Luminosity-sampler
I have now documented that the 3D plots and graphs are available when their required packages are installed and configured. Unfortunately, these softwares are not the cleanest things to install. Nevertheless, these are only eye candy. As the docs are built live on each commit, their functionality is tested via continuous integration.
The API was failing because the numpy version with type hinting was not installed by default and I have now fixed this. I have tried to update all the doc strings, but there are still places where they are missing. I will fill these out with time, but the main functionality has both doc strings and type hinting. Also, for some reason the inits are now not spitting out their doctoring with sphinx. I will try to clean this up as well.
But why? I need a solid reason to add complexity. Also I don't understand the justification for the currently implemented Auxiliary Samplers: I do understand the ViewingAngle, >but what's the science behind having a log-normal or a log10 Auxiliary Sampler?
These features are used so that one can simulate very complex. For example, in one project relying on popsynth, people are simulating blazars from a luminosity and spatial distribution, as well as sampling their continuum and line spectra. All of these are random variables. Some quantities are sampled from log normal distributions. Rather than rewrite these every time, I have included them in the package.
Also I am concerned that I don't see any validation: How do I know these give me realistic simulations? I liked your graphs showing the "selected" and "non-selected" objects - I'd like to see how that compares to other theoretical predictions. Say the feed the rate of supernovae from stellar evolution codes into popsynth and see if you roughly retrieve the number of observed supernovae (given a volume).
As the framework is generic, one would have to specify a theoretical problem to verify. As far as simple verification of the sampling, this is simply using Monte Carlo to integrate the equation now in the documentation. If you can provide an example, I can try and simulate that problem.
Here are a couple of examples: What is a SFR-like redshift? What is a Schecter luminosity function? What populations are we synthesysing? Stars? Transients? Observations? All three?
The SFR like is now explained in the documentation. Here is the some info on a schechter-luminosity function: https://ui.adsabs.harvard.edu/abs/1976ApJ...203..297S/abstract
The included distributions are just things that others and myself commonly have used. However, the reason for providing the ability to build your own outside of the repo, i.e., locally and on the fly in a notebook, is so that one can easily extend/ add to the framework as needed. The objects that are simulated can be anything.
I am updating the refs in the paper. Let me know if there are remaining issues.
Hi @grburgess,
First, thanks for the core concept page: it helps! I'm still not clear about the distinction between objects that always shine versus transients. The short GRB example makes the transient case clearer but what if I wanted to survey something like stars of a particular spectral type? My guess is that this is related to the definition of "intensity" as being some sort of time average and the only distinction between "transient" and "non-transient" is that the non-transient intensity is on average much greater. But it isn't clear to me how one might set this up in popsynth
.
For one specific thing, how do I specify the duration Δt of the survey?
Perhaps I'm mistakenly interpreting the populations of the transient events as progenitors of those events, so that e.g. some GRB progenitors don't become GRB during the survey and we don't detect them. But perhaps the point is that popsynth
doesn't care about the actual rate of events, or at least that's somehow incorporated in the luminosity distribution and we have to specify something that makes sense according to a (maybe probabilistic) model.
I personally don't find the "fire fly" concept in the quick start has helped me understand any better. Do they light up continuously? Do they repeat? I find the astrophysical examples somewhat more relatable.
There are still gaps in the API which I think would help a lot in terms of making the distributions make sense. Some are simple fixes (e.g. SFRDistribution could link to the relevant article) but many could simply be fleshed out more with the relevant formulae.
To create a concrete example that I hoped would cement some of my misunderstandings, I thought about making a flux-limited sample of stars with luminosities L=M³, where M is the mass drawn from some supposed initial mass function (I ended up using a lognormal). The aim is to demonstrate that the surveyed sample will be biased towards higher masses than the underlying sample. I spent about an hour on this but found myself uncertain how to proceed. I think I need to use a custom derived luminosity sampler, in which the underlying distribution would be the mass distribution, so I got as far as this script, largely copied from the docs:
#!/usr/bin/env python3
import numpy as np
import matplotlib.pyplot as pl
import popsynth
class LuminositySampler(popsynth.DerivedLumAuxSampler):
_auxiliary_sampler_name = "LuminositySampler"
mu = popsynth.auxiliary_sampler.AuxiliaryParameter(default=0)
tau = popsynth.auxiliary_sampler.AuxiliaryParameter(default=1, vmin=0)
def __init__(self, mu=0.0, tau=1.0, sigma=1):
# this time set observed=True
super(LuminositySampler, self).__init__("lumsample", uses_distance=False)
def true_sampler(self, size):
# draw a random number
tmp = np.random.lognormal(self.mu, self.tau, size=size)
self._true_values = tmp
def compute_luminosity(self):
# compute the luminosity
return 10**(3*np.log10(self._true_values))
lum = LuminositySampler()
pop_gen = popsynth.populations.SphericalPopulation(1, r_max=10)
pop_gen.add_observed_quantity(lum)
flux_selector = popsynth.HardFluxSelection()
flux_selector.boundary = 1e-2
pop_gen.set_flux_selection(flux_selector)
pop = pop_gen.draw_survey(flux_sigma=0.1)
# pop.display_fluxes(obs_color='m', true_color='y', s=15)
# pl.show()
pl.hist(pop.luminosities_latent**(1/3), bins=np.arange(0.2,20.0,0.2), label='all');
pl.hist(pop.luminosities_latent[pop.selection]**(1/3), bins=np.arange(0.2,20.0,0.2), label='selected');
pl.xlabel('M')
pl.legend()
pl.show()
The fact that I've managed to get what looks like the right result (a flux-limited survey is biased towards more massive stars) means the docs are in a better state than my first impression suggested! But I don't pretend to understand the values (e.g. the flux boundary value) nor the relationship between the samplers, since I've simply copied the docs. I'm also not sure if there's a way to get to the underlying mass distribution instead of working it out from the transformation L=M³.
I still think the API docs need the most work. I was mostly exploring what attributes and functions are available by hitting tab for autocomplete options in IPython. I was also querying docstrings using the ?
magic, which was often unhelpful. It would also be helpful to implement more __repr__
and/or __str__
functions, which would make it easier to see what attributes might not have been set, or quickly find out what default values they've taken. I've only used very simple distributions: the more complicated ones are more difficult because it's hard to see exactly what they do.
--
To finish with some concrete actions:
I'll re-read the paper soon.
Hi @grburgess,
First, thanks for the core concept page: it helps! I'm still not clear about the distinction between objects that always shine versus transients. The short GRB example makes the transient case clearer but what if I wanted to survey something like stars of a particular spectral type? My guess is that this is related to the definition of "intensity" as being some sort of time average and the only distinction between "transient" and "non-transient" is that the non-transient intensity is on average much greater. But it isn't clear to me how one might set this up in
popsynth
.
For the included cosmological distributions (though this could be modified externally), there is a constructor argument "as_rate" I haven't documented this well in the RTD page. But the difference here is on the dV/dz Jacobean that you need to account for the change in time on the cosmological interval when dealing with transients. If you aren't using cosmological distributions, then it depends on how you are define your volume elements.
This is a little bit involved and does depend on what you are modeling.
Also, the definition of intensity is just being used from the more general form of Poisson where we are talking about counting in an N-dim space: https://en.wikipedia.org/wiki/Poisson_point_process
You can imagine that each object's process is a Poisson point process in its space. I just don't want to go too much into that here as it can be found in the references.
For one specific thing, how do I specify the duration Δt of the survey?
It factors into the normalization of your spatial distribution. One would have to work this out for whatever case you are handling.
Perhaps I'm mistakenly interpreting the populations of the transient events as progenitors of those events, so that e.g. some GRB progenitors don't become GRB during the survey and we don't detect them. But perhaps the point is that
popsynth
doesn't care about the actual rate of events, or at least that's somehow incorporated in the luminosity distribution and we have to specify something that makes sense according to a (maybe probabilistic) model.
Yeah, there is no distinction... it really depends on how you define your volume elements.
I personally don't find the "fire fly" concept in the quick start has helped me understand any better. Do they light up continuously? Do they repeat? I find the astrophysical examples somewhat more relatable.
I can remove this.
There are still gaps in the API which I think would help a lot in terms of making the distributions make sense. Some are simple fixes (e.g. SFRDistribution could link to the relevant article) but many could simply be fleshed out more with the relevant formulae.
It is in the doc string but sphinx- auto doc is not working.
I think the better idea is to rename this to ColeSFR.. but this would break a lot of people's code at the moment. I think I will do that on a version bump.
To create a concrete example that I hoped would cement some of my misunderstandings, I thought about making a flux-limited sample of stars with luminosities L=M³, where M is the mass drawn from some supposed initial mass function (I ended up using a lognormal). The aim is to demonstrate that the surveyed sample will be biased towards higher masses than the underlying sample. I spent about an hour on this but found myself uncertain how to proceed. I think I need to use a custom derived luminosity sampler, in which the underlying distribution would be the mass distribution, so I got as far as this script, largely copied from the docs:
#!/usr/bin/env python3 import numpy as np import matplotlib.pyplot as pl import popsynth class LuminositySampler(popsynth.DerivedLumAuxSampler): _auxiliary_sampler_name = "LuminositySampler" mu = popsynth.auxiliary_sampler.AuxiliaryParameter(default=0) tau = popsynth.auxiliary_sampler.AuxiliaryParameter(default=1, vmin=0) def __init__(self, mu=0.0, tau=1.0, sigma=1): # this time set observed=True super(LuminositySampler, self).__init__("lumsample", uses_distance=False) def true_sampler(self, size): # draw a random number tmp = np.random.lognormal(self.mu, self.tau, size=size) self._true_values = tmp def compute_luminosity(self): # compute the luminosity return 10**(3*np.log10(self._true_values)) lum = LuminositySampler() pop_gen = popsynth.populations.SphericalPopulation(1, r_max=10) pop_gen.add_observed_quantity(lum) flux_selector = popsynth.HardFluxSelection() flux_selector.boundary = 1e-2 pop_gen.set_flux_selection(flux_selector) pop = pop_gen.draw_survey(flux_sigma=0.1) # pop.display_fluxes(obs_color='m', true_color='y', s=15) # pl.show() pl.hist(pop.luminosities_latent**(1/3), bins=np.arange(0.2,20.0,0.2), label='all'); pl.hist(pop.luminosities_latent[pop.selection]**(1/3), bins=np.arange(0.2,20.0,0.2), label='selected'); pl.xlabel('M') pl.legend() pl.show()
The fact that I've managed to get what looks like the right result (a flux-limited survey is biased towards more massive stars) means the docs are in a better state than my first impression suggested! But I don't pretend to understand the values (e.g. the flux boundary value) nor the relationship between the samplers, since I've simply copied the docs. I'm also not sure if there's a way to get to the underlying mass distribution instead of working it out from the transformation L=M³.
This is very cool! Perhaps you would like to add this to the examples? More examples of different things could help people think of how to use this. The boundary is for the value of flux in erg/s/cm2. I need to doc this. You could also do this:
mport numpy as np
import matplotlib.pyplot as pl
import popsynth
# create a sampler for mass
mass = popsynth.LogNormalAuxSampler(name="mass", observed = False)
class LuminositySampler(popsynth.DerivedLumAuxSampler):
_auxiliary_sampler_name = "LuminositySampler"
def __init__(self, mu=0.0, tau=1.0, sigma=1):
# this time set observed=True
super(LuminositySampler, self).__init__("lumsample", uses_distance=False)
def true_sampler(self, size):
mass = self._secondary_samplers["mass"].true_values
self._true_values = mass
def compute_luminosity(self):
# compute the luminosity
return 10**(3*np.log10(self._true_values))
lum = LuminositySampler()
lum.set_secondary_sampler(mass)
pop_gen = popsynth.populations.SphericalPopulation(1, r_max=10)
pop_gen.add_observed_quantity(lum)
flux_selector = popsynth.HardFluxSelection()
flux_selector.boundary = 1e-2
pop_gen.set_flux_selection(flux_selector)
pop = pop_gen.draw_survey(flux_sigma=0.1)
# pop.display_fluxes(obs_color='m', true_color='y', s=15)
# pl.show()
fig, ax = plt.subplots()
ax.hist(pop.mass, bins=np.arange(0.2,20.0,0.2), label='all');
ax.hist(pop.mass[pop.selection], bins=np.arange(0.2,20.0,0.2), label='selected');
ax.set_xlabel('M')
ax.legend()
pop.display_fluxes();
fig, ax = plt.subplots()
ax.hist(pop.luminosities_latent, bins=np.arange(0.2,20.0,0.2), label='all');
ax.hist(pop.luminosities_latent[pop.selection], bins=np.arange(0.2,20.0,0.2), label='selected');
ax.set_xlabel('L')
ax.legend()
I still think the API docs need the most work. I was mostly exploring what attributes and functions are available by hitting tab for autocomplete options in IPython. I was also querying docstrings using the
?
magic, which was often unhelpful. It would also be helpful to implement more__repr__
and/or__str__
functions, which would make it easier to see what attributes might not have been set, or quickly find out what default values they've taken. I've only used very simple distributions: the more complicated ones are more difficult because it's hard to see exactly what they do.
The ?
should work on most of the constructors. But as I've said... autodoc is being a pain. I will work to fill out the docs strings more deeply in the code. all the distributions should have a display method which explains the math. But I've gotten behind on keeping up with this.
--
To finish with some concrete actions:
- It still isn't clear how we relate the luminosities of transient versus non-transient things, and how the transient rate ("intensity"? "luminosity"?) incorporates the survey duration. The docs still need to explain this better upfront. It seems that to you there is no difference but to me there still is. (Consider: what about things that are intermittent and not strictly regular, like dwarf novae?)
I will doc the as_rate function better. But in general, it would be upon the user to write a correct spatial distribution. E.g., there are people using this to model flaring blazars, but these flare for much longer than GRBs. We have also used this to simulate static sources.
- I think the examples would benefit from a second case, aside from the GRB example, of something that isn't transient.
I will add the blazar example soon. But again, your example is very nice.
- There are some heading issues where different headings from the same page are being listed in the contents sidebar. Not a big issue but a bit confusing for navigation.
Ah... I need to separate this out.
I'll re-read the paper soon.
Okay, I think I've (very slowly!) come around to how the transient versus non-transient thing works. So I think the missing piece of the documentation is a description (or link to a description of) how the relevant normalization constant should be calculated and which parameter it corresponds to in popsynth
. At the moment I don't think this is mentioned at all in the docs.
You're more than welcome to use my example however you like. It'd benefit from being fleshed out with some vaguely reasonable numbers for the fluxes and distances and whatever other parameters. An interesting version—but not one I expect for this review!—might be to demonstrate how the (main sequence) stars we see by eye in the night sky are a flux-limited sample and therefore biased to higher masses. That could even be compared to a list of actual main sequence stars with V<4 or something (to keep the number small). I don't know how to construct that query but it's probably not too hard with astroquery.
Finally, regarding the formulae for the various functions, it's all well using the display
method (though I don't see it implemented everywhere) but don't presume everyone uses the same workflow and tools as you. I don't routinely use notebooks and in my IPython console the maths just displays as <IPython.core.display.Math object>
. I'd recommend just writing the formulae in the docstrings, much like e.g. SciPy. It's not pretty in a console either but at least I can use my mental LaTeX parser, which isn't too bad after all these years!
Okay, I think I've (very slowly!) come around to how the transient versus non-transient thing works. So I think the missing piece of the documentation is a description (or link to a description of) how the relevant normalization constant should be calculated and which parameter it corresponds to in
popsynth
. At the moment I don't think this is mentioned at all in the docs.
Good point! Units in many places would likely make this clearer. It's just that they can change depending on the application. I will think and try to write this up. But, yes, I will include this in the cosmological docs (I'm not sure how this works in fact space...)
You're more than welcome to use my example however you like. It'd benefit from being fleshed out with some vaguely reasonable numbers for the fluxes and distances and whatever other parameters. An interesting version—but not one I expect for this review!—might be to demonstrate how the (main sequence) stars we see by eye in the night sky are a flux-limited sample and therefore biased to higher masses. That could even be compared to a list of actual main sequence stars with V<4 or something (to keep the number small). I don't know how to construct that query but it's probably not too hard with astroquery.
Thanks! Would you like to do a PR for the credit? I can always modify in the end.
Finally, regarding the formulae for the various functions, it's all well using the display method (though I don't see it implemented everywhere) but don't presume everyone uses the same workflow and tools as you. I don't routinely use notebooks and in my IPython console the maths just displays as
. I'd recommend just writing the formulae in the docstrings, much like e.g. SciPy. It's not pretty in a console either but at least I can use my mental LaTeX parser, which isn't too bad after all these years!
Good point... I'm so used to using the notebook I forget. I will go through and start making beefier repr methods. There is a lot of code to document.
@warrickball Ok, Thanks to @cescalara , there is now another example in the docs as well as more complete doc strings. She has also now been added as a coauthor.
Hello!
Sorry for the delay. So much progress we're nearly there.
Sidenote: I am having issues with the website slowing my browser down significantly and freezing it - I think it might be the fancy 3D plots. I would recommend making the interactive plot an optional thing in a jupyter notebook somewhere else and instead having a static 3D plot maybe viewed from different angles to illustrate the point -> Or just warn people to firefox, instead of chrome if things get slow.
The fireflies can stay but the idea of making things accessible to PhD students isn't so much about making them cute, it is about not assuming knowledge and giving them the tools to learn more if some explanations are beyond the scope of your tutorial. For example, why is the luminosity function a power law - can we have a reference for the Pareto luminosity function? Put yourself in the shoes of someone who is new to the field - don't hesitate to cite classical references or even a textbook. Someone will be thankful for it.
I really like that the level of detail you've got in most of your docstrings to describe the Luminosity functions - with the maths, and sometimes even the appropriate reference. I do have a question: how are people supposed to know which parameters for the distributions are physical and what isn't. Do they need to check the references or would it be easy for you to show or write somewhere the range of values that make sense?
I love that you have an example of a transient population and an example of a non transient population - thanks to the other referee for suggesting that and to you for implementing it, it really makes all the difference. It's really cool to be able to see the underlying population and the observed one on top of each other.
The docstrings need to be completed - remember this is the first point of contact for most coders: myclass.doc or myclass()? are essential. I don't want to have to go back to the tutorials evertime I forget what a parameter does. A bit more information would be nice on some of them, if you can ensure that they have, 3 things:
I like the example of the selection bias towards massive stars! Could you add that to the docs I don't see it under "Examples"?
Add community guidelines to the README on the github not just to the documentation
Hi,
Sorry about the plots slowing things down. I have reduced the number of objects to try and make the plotting manageable.
Regarding the first two comments. Pareto is just a normalized power law. There are references to an economic principle, but the mathematical form should be sufficient as I would not want someone to think that it is in any way related to economics. As far as parameter ranges, this being a generic framework, there are no defined parameter ranges other than those that would make the functions mathematically incorrect (negative amplitudes for example). This is also why references really don't make much sense in this context as it is meant to be a frame work to build populations. The distributions are just normalized mathematical functions. One could place values of parameters that correspond to theoretical or observed values. Moreover, the description in the documentation for constructing custom distributions more important as other users have generally made their own distribution functions. As there is a lot of freedom, the only thing that can be documented is the construction.
The docstrings are now filled out.
I have added the massive stars example to the documentation.
I have added the community guidelines to the README.
@warrickball I have also now added text repr so that quick information can be displayed in ipython
okay cool just one question: is there a physical motivation for using this pareto function? Or is it a case of "we don't know what the luminosity function is doing for our populations so a powerlaw is a good guess"?
In general, luminosity functions are empirical descriptions that are modeling several underlying processes that manifest as a single observed distribution. For example, the GRB luminosity function is likely driven by progenitor mass / explosion energy, radiative process, and jet viewing angle. Observationally, this manifests as a broken power law. Thus, unless one is modeling all the physical processes, the most straight-forward modeling approach is to use some empirical function like a broken power law, Schecter function etc. On the simulation side, one can go any route they choose. If you want to model the luminosity distribution as an empirical function, then one does that. In popsynth
the function of the DerivedLumAuxSampler
is that the value of luminosity per object is a manifest property of underlying variables.
Nevertheless, there are several cases fo power laws being used to model luminosity functions. E.g., the radio luminosity function of of radio quiet AGN: https://iopscience.iop.org/article/10.1088/0004-637X/740/1/20 .
The inclusion of the Pareto distribution is simply because it is used a lot by those using this code, but as the software is provided as a framework to be extended, there is no preference for this distribution.
okay cool! Thanks for addressing all my comments - all my boxes are ticked and on my side of things we are done!
@whedon generate pdf
PDF failed to compile for issue #3257 with the following error:
/app/vendor/ruby-2.6.6/lib/ruby/2.6.0/psych.rb:456:in `parse': (9c248ef6a695b70d00dd7037/paper.md): found a tab character that violate intendation while scanning a plain scalar at line 12 column 11 (Psych::SyntaxError)
from /app/vendor/ruby-2.6.6/lib/ruby/2.6.0/psych.rb:456:in `parse_stream'
from /app/vendor/ruby-2.6.6/lib/ruby/2.6.0/psych.rb:390:in `parse'
from /app/vendor/ruby-2.6.6/lib/ruby/2.6.0/psych.rb:277:in `load'
from /app/vendor/ruby-2.6.6/lib/ruby/2.6.0/psych.rb:578:in `block in load_file'
from /app/vendor/ruby-2.6.6/lib/ruby/2.6.0/psych.rb:577:in `open'
from /app/vendor/ruby-2.6.6/lib/ruby/2.6.0/psych.rb:577:in `load_file'
from /app/vendor/bundle/ruby/2.6.0/bundler/gems/whedon-b63fc70cc085/lib/whedon.rb:127:in `load_yaml'
from /app/vendor/bundle/ruby/2.6.0/bundler/gems/whedon-b63fc70cc085/lib/whedon.rb:87:in `initialize'
from /app/vendor/bundle/ruby/2.6.0/bundler/gems/whedon-b63fc70cc085/lib/whedon/processor.rb:38:in `new'
from /app/vendor/bundle/ruby/2.6.0/bundler/gems/whedon-b63fc70cc085/lib/whedon/processor.rb:38:in `set_paper'
from /app/vendor/bundle/ruby/2.6.0/bundler/gems/whedon-b63fc70cc085/bin/whedon:58:in `prepare'
from /app/vendor/bundle/ruby/2.6.0/gems/thor-0.20.3/lib/thor/command.rb:27:in `run'
from /app/vendor/bundle/ruby/2.6.0/gems/thor-0.20.3/lib/thor/invocation.rb:126:in `invoke_command'
from /app/vendor/bundle/ruby/2.6.0/gems/thor-0.20.3/lib/thor.rb:387:in `dispatch'
from /app/vendor/bundle/ruby/2.6.0/gems/thor-0.20.3/lib/thor/base.rb:466:in `start'
from /app/vendor/bundle/ruby/2.6.0/bundler/gems/whedon-b63fc70cc085/bin/whedon:131:in `<top (required)>'
from /app/vendor/bundle/ruby/2.6.0/bin/whedon:23:in `load'
from /app/vendor/bundle/ruby/2.6.0/bin/whedon:23:in `<main>'
@whedon generate pdf
PDF failed to compile for issue #3257 with the following error:
/app/vendor/ruby-2.6.6/lib/ruby/2.6.0/psych.rb:456:in `parse': (45cfb8b3f4d8e6771d4bdf81/paper.md): found a tab character that violate intendation while scanning a plain scalar at line 12 column 11 (Psych::SyntaxError)
from /app/vendor/ruby-2.6.6/lib/ruby/2.6.0/psych.rb:456:in `parse_stream'
from /app/vendor/ruby-2.6.6/lib/ruby/2.6.0/psych.rb:390:in `parse'
from /app/vendor/ruby-2.6.6/lib/ruby/2.6.0/psych.rb:277:in `load'
from /app/vendor/ruby-2.6.6/lib/ruby/2.6.0/psych.rb:578:in `block in load_file'
from /app/vendor/ruby-2.6.6/lib/ruby/2.6.0/psych.rb:577:in `open'
from /app/vendor/ruby-2.6.6/lib/ruby/2.6.0/psych.rb:577:in `load_file'
from /app/vendor/bundle/ruby/2.6.0/bundler/gems/whedon-b63fc70cc085/lib/whedon.rb:127:in `load_yaml'
from /app/vendor/bundle/ruby/2.6.0/bundler/gems/whedon-b63fc70cc085/lib/whedon.rb:87:in `initialize'
from /app/vendor/bundle/ruby/2.6.0/bundler/gems/whedon-b63fc70cc085/lib/whedon/processor.rb:38:in `new'
from /app/vendor/bundle/ruby/2.6.0/bundler/gems/whedon-b63fc70cc085/lib/whedon/processor.rb:38:in `set_paper'
from /app/vendor/bundle/ruby/2.6.0/bundler/gems/whedon-b63fc70cc085/bin/whedon:58:in `prepare'
from /app/vendor/bundle/ruby/2.6.0/gems/thor-0.20.3/lib/thor/command.rb:27:in `run'
from /app/vendor/bundle/ruby/2.6.0/gems/thor-0.20.3/lib/thor/invocation.rb:126:in `invoke_command'
from /app/vendor/bundle/ruby/2.6.0/gems/thor-0.20.3/lib/thor.rb:387:in `dispatch'
from /app/vendor/bundle/ruby/2.6.0/gems/thor-0.20.3/lib/thor/base.rb:466:in `start'
from /app/vendor/bundle/ruby/2.6.0/bundler/gems/whedon-b63fc70cc085/bin/whedon:131:in `<top (required)>'
from /app/vendor/bundle/ruby/2.6.0/bin/whedon:23:in `load'
from /app/vendor/bundle/ruby/2.6.0/bin/whedon:23:in `<main>'
@whedon generate pdf
PDF failed to compile for issue #3257 with the following error:
/app/vendor/bundle/ruby/2.6.0/bundler/gems/whedon-b63fc70cc085/lib/whedon/author.rb:72:in `block in build_affiliation_string': Problem with affiliations for Francesca Capel, perhaps the affiliations index need quoting? (RuntimeError)
from /app/vendor/bundle/ruby/2.6.0/bundler/gems/whedon-b63fc70cc085/lib/whedon/author.rb:71:in `each'
from /app/vendor/bundle/ruby/2.6.0/bundler/gems/whedon-b63fc70cc085/lib/whedon/author.rb:71:in `build_affiliation_string'
from /app/vendor/bundle/ruby/2.6.0/bundler/gems/whedon-b63fc70cc085/lib/whedon/author.rb:17:in `initialize'
from /app/vendor/bundle/ruby/2.6.0/bundler/gems/whedon-b63fc70cc085/lib/whedon.rb:205:in `new'
from /app/vendor/bundle/ruby/2.6.0/bundler/gems/whedon-b63fc70cc085/lib/whedon.rb:205:in `block in parse_authors'
from /app/vendor/bundle/ruby/2.6.0/bundler/gems/whedon-b63fc70cc085/lib/whedon.rb:202:in `each'
from /app/vendor/bundle/ruby/2.6.0/bundler/gems/whedon-b63fc70cc085/lib/whedon.rb:202:in `parse_authors'
from /app/vendor/bundle/ruby/2.6.0/bundler/gems/whedon-b63fc70cc085/lib/whedon.rb:93:in `initialize'
from /app/vendor/bundle/ruby/2.6.0/bundler/gems/whedon-b63fc70cc085/lib/whedon/processor.rb:38:in `new'
from /app/vendor/bundle/ruby/2.6.0/bundler/gems/whedon-b63fc70cc085/lib/whedon/processor.rb:38:in `set_paper'
from /app/vendor/bundle/ruby/2.6.0/bundler/gems/whedon-b63fc70cc085/bin/whedon:58:in `prepare'
from /app/vendor/bundle/ruby/2.6.0/gems/thor-0.20.3/lib/thor/command.rb:27:in `run'
from /app/vendor/bundle/ruby/2.6.0/gems/thor-0.20.3/lib/thor/invocation.rb:126:in `invoke_command'
from /app/vendor/bundle/ruby/2.6.0/gems/thor-0.20.3/lib/thor.rb:387:in `dispatch'
from /app/vendor/bundle/ruby/2.6.0/gems/thor-0.20.3/lib/thor/base.rb:466:in `start'
from /app/vendor/bundle/ruby/2.6.0/bundler/gems/whedon-b63fc70cc085/bin/whedon:131:in `<top (required)>'
from /app/vendor/bundle/ruby/2.6.0/bin/whedon:23:in `load'
from /app/vendor/bundle/ruby/2.6.0/bin/whedon:23:in `<main>'
@whedon generate pdf
:point_right::page_facing_up: Download article proof :page_facing_up: View article proof on GitHub :page_facing_up: :point_left:
I just had a look at the latest PDF draft and have a two very minor comments.
First, the opening sentence,
Simulating a population survey of fluxes and redshifts (distances) from an astrophysical population is a routine task.
reads slightly strangely to me, I think because I don't quite know what a "population survey ... from an astrophysical population" is. It makes more sense to me if the first occurrence of "population" is deleted.
Second, something has gone wrong with this reference:
al., A. A. et. (2021). SkyPy: A package for modelling the universe. In GitHub repository. GitHub.https://github.com/skypyproject/skypy
The only other thing I'd like to check again is the state of the automated tests. I can see that the CI appears to be working but I just want to check that everything makes sense for potential contributors. Since @HeloiseS is happy, I'll try to do this today but I might have to work on some more urgent things that'll push this back to late next week.
@warrickball yes, there are too many populations in that sentence. I removed the first one.
I see that SkyPy has a joss-paper under review, so I grabbed the first author from there.
@whedon generate pdf
:point_right::page_facing_up: Download article proof :page_facing_up: View article proof on GitHub :page_facing_up: :point_left:
I updated a previous checkout and was able to run the tests successfully by running pytest
in the popsynth
folder. It now looks like this isn't documented but should be, probably in a section near Contributing.
I also noticed that test_schecter
(sic) takes a long time. I didn't record any timing information but it was many minutes. The CI report for the Ubuntu build of the most recent commit shows the tests taking 16 minutes. You might be happy with that but it seems surprisingly long and probably dominated by that one test. In my experience, if tests take a long time, people won't run them or won't run them often enough to catch bugs.
The tests do run, though, so my checklist is complete and I'm happy for this to be accepted in JOSS.
@warrickball yes, this is the hypothesis package that tries to inject parameters until it finds a problem. This is the first and only package I have used it on because of the speed issues. It does help to find issues with parameter inputs... but can lead to very long running tests.
@xuanxu what are the next steps?
Thanks @HeloiseS & @warrickball for your reviews!
@grburgess I think I spotted a typo in the paper' summary
@xuanxu merged. Thanks!
@whedon generate pdf
:point_right::page_facing_up: Download article proof :page_facing_up: View article proof on GitHub :page_facing_up: :point_left:
OK @grburgess, everything looks good, here are the next steps:
Once you do that please report here the version number and archive DOI
@xuanxu I usually do a tag which then releases to pypi. should I just tar the pypi release and put that in zenodo ?
@grburgess yes, you can upload the .zip or the .tar file generated by GH when the tag (or release) is created.
@xuanxu I have uploaded a new version to zenodo:
it is also on pypi: https://pypi.org/project/popsynth
@grburgess The zenodo link points to a snapshot of the 0.2.2 version. I think you need to create a new upload with the latest zip file. Also, please change the title to match the paper.
Submitting author: @grburgess (J. Michael Burgess) Repository: https://github.com/grburgess/popsynth Version: v1.0.3 Editor: @xuanxu Reviewer: @HeloiseS, @warrickball Archive: 10.5281/zenodo.5109590
:warning: JOSS reduced service mode :warning:
Due to the challenges of the COVID-19 pandemic, JOSS is currently operating in a "reduced service mode". You can read more about what that means in our blog post.
Status
Status badge code:
Reviewers and authors:
Please avoid lengthy details of difficulties in the review thread. Instead, please create a new issue in the target repository and link to those issues (especially acceptance-blockers) by leaving comments in the review thread below. (For completists: if the target issue tracker is also on GitHub, linking the review thread in the issue or vice versa will create corresponding breadcrumb trails in the link target.)
Reviewer instructions & questions
@HeloiseS & @warrickball, please carry out your review in this issue by updating the checklist below. If you cannot edit the checklist please:
The reviewer guidelines are available here: https://joss.readthedocs.io/en/latest/reviewer_guidelines.html. Any questions/concerns please let @xuanxu know.
✨ Please start on your review when you are able, and be sure to complete your review in the next six weeks, at the very latest ✨
Review checklist for @HeloiseS
Conflict of interest
Code of Conduct
General checks
Functionality
Documentation
Software paper
Review checklist for @warrickball
Conflict of interest
Code of Conduct
General checks
Functionality
Documentation
Software paper