openjournals / joss-reviews

Reviews for the Journal of Open Source Software
Creative Commons Zero v1.0 Universal
712 stars 38 forks source link

[REVIEW]: Hoki: Making BPASS accessible through Python #1987

Closed whedon closed 4 years ago

whedon commented 4 years ago

Submitting author: @heloises (H. F. Stevance) Repository: https://github.com/HeloiseS/hoki Version: v1.3.2b0 Editor: @arfon Reviewer: @ygrange, @KshitijAggarwal Archive: 10.5281/zenodo.3629869

Status

status

Status badge code:

HTML: <a href="https://joss.theoj.org/papers/192e0f86c488fed9ea50ace255575183"><img src="https://joss.theoj.org/papers/192e0f86c488fed9ea50ace255575183/status.svg"></a>
Markdown: [![status](https://joss.theoj.org/papers/192e0f86c488fed9ea50ace255575183/status.svg)](https://joss.theoj.org/papers/192e0f86c488fed9ea50ace255575183)

Reviewers and authors:

Please avoid lengthy details of difficulties in the review thread. Instead, please create a new issue in the target repository and link to those issues (especially acceptance-blockers) by leaving comments in the review thread below. (For completists: if the target issue tracker is also on GitHub, linking the review thread in the issue or vice versa will create corresponding breadcrumb trails in the link target.)

Reviewer instructions & questions

@ygrange & @KshitijAggarwal, please carry out your review in this issue by updating the checklist below. If you cannot edit the checklist please:

  1. Make sure you're logged in to your GitHub account
  2. Be sure to accept the invite at this URL: https://github.com/openjournals/joss-reviews/invitations

The reviewer guidelines are available here: https://joss.readthedocs.io/en/latest/reviewer_guidelines.html. Any questions/concerns please let @arfon know.

Please try and complete your review in the next two weeks

Review checklist for @ygrange

Conflict of interest

Code of Conduct

General checks

Functionality

Documentation

Software paper

Review checklist for @KshitijAggarwal

Conflict of interest

Code of Conduct

General checks

Functionality

Documentation

Software paper

whedon commented 4 years ago

Hello human, I'm @whedon, a robot that can help you with some common editorial tasks. @ygrange, @KshitijAggarwal it looks like you're currently assigned to review this paper :tada:.

:star: Important :star:

If you haven't already, you should seriously consider unsubscribing from GitHub notifications for this (https://github.com/openjournals/joss-reviews) repository. As a reviewer, you're probably currently watching this repository which means for GitHub's default behaviour you will receive notifications (emails) for all reviews 😿

To fix this do the following two things:

  1. Set yourself as 'Not watching' https://github.com/openjournals/joss-reviews:

watching

  1. You may also like to change your default settings for this watching repositories in your GitHub profile here: https://github.com/settings/notifications

notifications

For a list of things I can do to help you, just type:

@whedon commands

For example, to regenerate the paper pdf after making changes in the paper's md or bib files, type:

@whedon generate pdf
whedon commented 4 years ago
Attempting to check references...
whedon commented 4 years ago
Attempting PDF compilation. Reticulating splines etc...
whedon commented 4 years ago

:point_right: Check article proof :page_facing_up: :point_left:

whedon commented 4 years ago

OK DOIs

- 10.1017/pasa.2017.51 is OK
- 10.1126/science.1223344 is OK
- 10.1088/0067-0049/190/1/1 is OK
- 10.1111/j.1365-2966.2007.12738.x is OK
- 10.1093/mnras/stt1612 is OK
- 10.1093/mnras/stw1772 is OK
- 10.1103/PhysRevLett.119.161101 is OK
- 10.1093/mnras/sty1353 is OK
- 10.1111/j.1365-2966.2009.15514.x is OK

MISSING DOIs

- None

INVALID DOIs

- None
arfon commented 4 years ago

@ygrange, @KshitijAggarwal - please carry out your review in this issue by updating the checklist above and giving feedback in this issue. The reviewer guidelines are available here: https://joss.readthedocs.io/en/latest/reviewer_guidelines.html

Any questions/concerns please let me know.

ygrange commented 4 years ago

Not only will I not work during the break, I will also not do much in the first working week of the nwe year :). Will be looking at this after that though. Looking forward to it!

ygrange commented 4 years ago

Just wanted to check off the first bunch of boxes this week and that leads to one comment:

arfon commented 4 years ago
  • The visual on the repo README page calls itself HOKI beta. Not really sure what the JOSS policy is but I would expect it to be that the published release is not in beta state. Probably more a question to @arfon.

I guess this kind of depends upon what kind of a beta this is. Wasn't Gmail famously in beta for > 10 years 😁?

@HeloiseS - can you advise?

HeloiseS commented 4 years ago

Hello !

The main reason it says beta is because user testing has been limited to a few people at a workshop and me so far and it hadn't been reviewed by other programmers. I also expect that significant features I may not have thought about will be requested by BPASS and hoki users in the upcoming months.

I'm happy to take it out of beta after this review :) It makes more sense to me to have the first not-pre-release come with the JOSS paper (rather than before), even if the code base remains in a state of continuous development.

What do you think? @arfon @ygrange

arfon commented 4 years ago

I'm happy to take it out of beta after this review :) It makes more sense to me to have the first not-pre-release come with the JOSS paper (rather than before), even if the code base remains in a state of continuous development.

Yes, this sounds very reasonable. This happens semi-regularly here.

ygrange commented 4 years ago

You are making good points! My personal opinion (please ignore anything I say if you disagree) would be here that little usage testing certainly makes a good case for beta status but it would be nice for you to have an idea when you think it is ok. Adding functionality (even if it is considered significant features) could be part of the standard release process so I would not necessarily make it a reason to keep it beta. BTW (mostly for completeness and probably stating the blatently obvious): the reason I think it matters is that having it in beta phase may stop new potential users from using it.

HeloiseS commented 4 years ago

BTW (mostly for completeness and probably stating the blatently obvious): the reason I think it matters is that having it in beta phase may stop new potential users from using it.

That makes a lot of sense. I think I'll take the beta off when this review is done. Also you make a good point as to where do I draw the line with feature additions being a part of a pre release or a full release - I don't think there will ever be one. This code base will HAVE to evolve with the field so you're right, that's more a matter of big releases that a beta status.

:) Anyway, the beta is coming off when hoki's been signed off on you folks, thanks for the convo.

KshitijAggarwal commented 4 years ago

Thanks for suggesting me to review this. Great work on hoki. It is really interesting and would be very useful for the community. I haven't yet tested the functionality of the code, will do that soon. Here are a few comments from the rest:

This is my first JOSS review. Feel free to correct me or ask if any of the above doesn't make sense.

HeloiseS commented 4 years ago

The JOSS guideline say that contribution doesn't have to be in code to justify authorship. JJ and elizabeth are the BPASS developers. I have needed their input and will continue to do so as I further develop the code - it seems only fair to me they also get a citation.

arfon commented 4 years ago

Thanks for flagging this @KshitijAggarwal. The authorship checkbox is there to prompt a conversation about authorship and is usually used by reviewers to identify potential missing authors.

The JOSS guideline say that contribution doesn't have to be in code to justify authorship.

That's correct.

I have needed their input and will continue to do so as I further develop the code - it seems only fair to me they also get a citation.

Right, it sounds like they are making a major intellectual contribution to this effort even if they're not directly contributing code to this package.

HeloiseS commented 4 years ago

Thanks so much for your comments @KshitijAggarwal !

  • Under Motivation, an IDL code has been mentioned which was used to create synthetic CMDs before hoki. Is there any relevant link to the code and/or citation that can be added here?

=> I Added a link to the user manual that talks about the IDL code. There isn't much in the way of published work on this since it's just a script that's been released with the data.

Similarly, I would recommend citing any other code/publication which have implemented any pre-processing on BPASS data products or any subset of the functionality of hoki.

BPASS data have always required pre-processing and they have been used for years. This description would encompass dozens and dozens of papers (if not more). I'm not sure it would be useful to the reader - that's why we usually cite the core BPASS papers like Eldridge 2008, 2017 and Stanway 2018. I hope that's okay!

  • The tutorial link given here (and all other notebooks/pages under Recipes) should be updated with the GitHub tutorial repo. => Fixed the links! Thanks!

    Some typos: Under Research in the paper:

  • ‘asbolutely’ -> ‘absolutely'
  • ‘invovled’ -> ‘involved’

=> Fixed! Thanks for spotting them :smiley:

ygrange commented 4 years ago

Short note to say I haven't forgotten completely! I wanted to finish off the last points today but my family basically all fell ill (still waiting for me to get ill thugh). So I had a few other things on my mind. I expect to be done end of this week (either thursday or at the latest next week tuesday). Apologies for the bit of delay on my side!

KshitijAggarwal commented 4 years ago

Thanks for replying to my comments. I think it is okay to just cite the BPASS papers in this case.

I looked at the code and really like the heavy use of Pandas Dataframes. The documentation is also pretty good and useful (so was the rickrolling!!). One small thing is that I prefer using logging instead of prints as it provides a pretty flexible logging system. It is not at all necessary for this but it is something you can have a look at if you haven't already.

I followed the tutorial notebooks which are well made and easy to follow. I encountered two errors during those, and have created two issues (here, here) corresponding to them. Also, it wasn't very convenient to get data from Google Drive (esp. for the remote machine), but I don't think that is on hoki. I guess BPASS must've always done it this way. Still, if possible, it would be preferred to host this huge database on a server, from where it can be downloaded using command-line tools.

KshitijAggarwal commented 4 years ago

I think it would be useful to add instructions on installing hoki from source (i.e Github). I did a simple install using setup.py, and it gave a few warnings (here). Though all the tests were passed, and I was able to run the tutorial notebooks as well, it may be useful to look into the warnings and check if they might impact any untested function in hoki.

I don't have any more comments on this. It was really great reviewing this package, it is really well done and I am sure that it would be very useful for the people using BPASS data products. Once the issue (https://github.com/HeloiseS/hoki/issues/4) is addressed, I will check the last item on my checklist.

ygrange commented 4 years ago

Also: I see a bunch of things that @KshitijAggarwal mentions that also appear in my draft review so I think I will not add a lot of significant comments.

KshitijAggarwal commented 4 years ago

Thanks for responding to my comments and issues so quickly. I reinstalled the package, re-ran the notebooks, and everything works perfectly. I have one small suggestion which I forgot to add yesterday (this is not essential for review btw). As you have mentioned in the tutorial notebook, make() in CMD class takes quite a bit of time to run. It might be useful to add a progress bar (like tqdm) on the for loop which loops over each line of the input file in this function. This will not only show the progress of the process but will also provide an estimated time to completion of the function.

The review is complete from my end now.

HeloiseS commented 4 years ago

Thanks!!! I added an "enhancement" issue to keep that in mind for the next major release :smiley:

ygrange commented 4 years ago

Ok, there we go:

The paper is clearly written, readable and concise. The software is clearly documented and the jupyter notebook tutorials are a pretty good way to get a feeling of how to use the tool. In general I really like this submission. So I fully agree with @KshitijAggarwal that this was an enjoyable package to review and that even without any experience in using BPASS data I can imagine this increasing the user base of those models greatly!

Here are my comments.

It is not really mentioned in the README what BPASS is. I think adding a sentence of two to the README will also make Hoki accessible to a larger audience of observational astronomers.

Installation of the master version of the code goes pretty smooth, even on an empty OS. Neat!

When installing Hoki on one of my test systems, pip reported that the bdist_wheel command was needed. I suggest to add the pip package wheel to the setup dependency list (though it probably exists in pretty much any realistic environment).

Small point of advice: add to the README how the current development version can be installed (pip3 install . in the checked-out repo is probably the most fool-proof option here). (NB: I think a comparable comment has also been given by the other reviewer). Also my personal preference is to be told that a package has tests I could execute if I install from source. I leave it up to you to decide if you agree and put it in the README.

Would be useful to mention Hoki does not work on python2 (yes, I know... Still many systems have it by default)

The Tutorials: I really like the way you set up separate jupyter notebooks so that people get to play with the interface. Way to go!

I would assue people following the tutorial may be as new to the field as I am so I will basically flag all things that are not fully clear to me. If some things need too much explanation and the target audience should know, feel free to ignore (or comment).

The three issues that you fixed were al in my review too :). Quite well-fixed :)

I was wondering: would it be possible to generate a small BPASS data set for testing purposes? If this is possible, I would advice that you use something like binder to give people a very low-threshold way to play with the notebooks (through a badge in the README). Probably 45GB downloads are not going to be too much appreciated by the binder people... Surely such data would be not very scientifically interesting and maybe the plots may not be illustrative.

Paper: My take on the paper is that it is targeted at observational astronomers who could use some simulated data to compare to their observation. So I'm trying to read it as such.

You refer to "a code written in IDL". I would at least state its name (if it is at all known). Maybe a citation if you want to be nice.

One of the features is "Standard BPASS Constants". I am not really sure what this is supposed to mean.

Is BPASS a tool that I could download and run, or is it a sumulation that is provided by the BPASS team? I think it may be useful to clarify in a sentence or so in the paper.

Another thing I noticed is that since the BPASS data is relatively big, users may want to run Hoki on a remote machine. On the other hand the data is in a Google drive, making it not very trivial to download using wget or so. So I would advice you to consider implementing a method to query/download the BPASS data.

ygrange commented 4 years ago

Small issue I noticed when having another glance at the readthedocs: you may want to link the notebook tutorials somewhere in the intro.

ygrange commented 4 years ago

Ok, last one: all boxes checked!

HeloiseS commented 4 years ago

Thanks @ygrange ! Here's a quick response/summary:

Fixes

So people can get to it and make it clear it's the data products that the community uses. Is that okay :question:

Clarifications


ygrange commented 4 years ago

I think this clarifies the status of the data :).

My way of reproducting the wheel issue was to build python 3.8 from source and removing all packages (the cluster I use for testing didn't have a reliable python3 module so we had to create one for this cause; don't even ask :) ).

On the clarifications:

I think there are still a few of my minor issues not addressed, but thanks for all things you already did!

HeloiseS commented 4 years ago

@ygrange

I think there are still a few of my minor issues not addressed, but thanks for all things you already did!

Is there anything else you'd like me to do? I'm happy to make further modifications to the paper or code!

I notice you've not marked the "statement of need" box as finished, is there anything I can do to clarify that need?

ygrange commented 4 years ago

Whoops, sorry that was my mistake! Statement of need is (very) clear!

I think the only issues I think you didn't mention in your comments are the two on terminology:

  1. in one of the notebooks a "BPASS input file" is mentioned and I didn't really know what that was supposed to mean. Maybe would be good to clarify
  2. "Standard BPASS constants" in the feature list (in the paper) is also a bit I didn't really get the meaning of.

That's really pretty much it.

HeloiseS commented 4 years ago

My bad, I forgot to say I changed the phrasing of the CMD jupyter notebook, now the description of the input file location and and naming convention is given at the same time as that of the stellar models:

The stellar models and input files can be downloaded from the google drive (bpass-v2.2-newmodels for the models and e.g. bpass_v2.2.1_imf135_300 to get the required model_input files).

The notebook also tells people where they are going to be using these input files in this sentence:

In hoki you will be creating a CMD() object instantiated with a model of your choosing (a particular IMF and metallicity)- for this you need to provide the location of a BPASS input file.

As for BPASS constants, I'm not sure what to say, they are constants used in BPASS - I don't mean to be facetious that's literally what they are! Like, the arrays containing the time bins and the number of years per time bin, a dictionary for the big BPASS array in the stellar models... They're just constants that come from the way the BPASS simulations are run and they are used by hoki and BPASS users. It's not the main feature for sure, but it's there so I thought I'd declare it?

What description do you think would help convey this idea?

ygrange commented 4 years ago
  1. Ah ok, makes sense. I think what confused me, to begin with, was that the BPASS input files are actually outputs of the model. So this went hand in hand with my misunderstanding of the way BPASS data is provided. Since this is clarified on both ends, even I now would get it :).

  2. I wasn't really sure whether to interpret that as physical constants (things like Hubble constant but maybe also the more standard physical constants like G) or something else. Again, I take the role of a somewhat ignorant person (which isn't too hard to play for me) who has some observations and wants to use the simulations to compare with.

So they are constants output by the model? Or are those actually the model inputs? Ore are they even higher-level constants that basically do not depend on the specific model that was executed? They are internally used in Hoki, but also by users you say. What would a user use them for? The docstring seems to suggest I shouldn't really bother too much as a standard user.

So I think adding something like ", for internal Hoki usage, or to support advanced BPASS users" may clarify to 'people like me' that this is not something they should necessarily want to dive into.

Does that make sense?

HeloiseS commented 4 years ago

@ygrange yes that's a great idea!

They are not physical constants, they're things defined in BPASS - that's why some BPASS users might use them, but you're right that applying the term advance BPASS user makes sense here because for most people it will be hidden by hoki.

I added this to to paper now! :D

ygrange commented 4 years ago

Ah perfect!

I think that's all from me. Sorry for holding it up just a bit longer. I hope my comments added a delta to the quality of this already pretty good submission!

Finally, let me test if I understand how Whedon works so I can look at the final paper:

@whedon generate pdf (ok, that didn't work. Let me try a separate message then)

ygrange commented 4 years ago

@whedon generate pdf

whedon commented 4 years ago

:point_right: Check article proof :page_facing_up: :point_left:

arfon commented 4 years ago

@whedon accept

whedon commented 4 years ago

No archive DOI set. Exiting...

arfon commented 4 years ago

@HeloiseS - it looks like both @ygrange and @KshitijAggarwal have completed their reviews now ✨

At this point could you make a new release of this software that includes the changes that have resulted from this review. Then, please make an archive of the software in Zenodo/figshare/other service and update this thread with the DOI of the archive? For the Zenodo/figshare archive, please make sure that:

I can then move forward with accepting the submission.

HeloiseS commented 4 years ago

@arfon, So the DOI should be a different DOI than the one I already have for Hoki? I'm a bit confused.

Also many thanks to @ygrange and @KshitijAggarwal for their review!! <3

arfon commented 4 years ago

@arfon, So the DOI should be a different DOI than the one I already have for Hoki? I'm a bit confused.

Does the DOI you already have include all of the changes that have resulted from this review?

HeloiseS commented 4 years ago

Hey @arfon thanks!

Not yet! I will push a final version and update pypi and remove beta status (was waiting for final approval).

The name of the Zenodo folder tho will evolve after each release - so I can can call this release the same thing as the paper but it's going to change. Is that okay?

Here is a link to the current Zenodo folder (Iatest release was as I was submitting to JOSS): https://zenodo.org/record/3588063#.XjCuJHUzbmE

UPDATE: I've now produced the first official release (everything updated on github and pypi), here is the corresponding zenodo page: https://zenodo.org/record/3629869#.XjC2tnUzbmE

arfon commented 4 years ago

Great, thanks @HeloiseS. Please could you make the following adjustments to the Zenodo release, making sure that:

HeloiseS commented 4 years ago

@arfon Yeah I fixed it! https://zenodo.org/record/3629869#.XjHauXUzZNw :D

arfon commented 4 years ago

@whedon set 10.5281/zenodo.3629869 as archive

whedon commented 4 years ago

OK. 10.5281/zenodo.3629869 is the archive.

arfon commented 4 years ago

@whedon accept

whedon commented 4 years ago
Attempting dry run of processing paper acceptance...
whedon commented 4 years ago

Check final proof :point_right: https://github.com/openjournals/joss-papers/pull/1257

If the paper PDF and Crossref deposit XML look good in https://github.com/openjournals/joss-papers/pull/1257, then you can now move forward with accepting the submission by compiling again with the flag deposit=true e.g.

@whedon accept deposit=true
whedon commented 4 years ago
Reference check summary:

OK DOIs

- 10.1017/pasa.2017.51 is OK
- 10.1126/science.1223344 is OK
- 10.1088/0067-0049/190/1/1 is OK
- 10.1111/j.1365-2966.2007.12738.x is OK
- 10.1093/mnras/stt1612 is OK
- 10.1093/mnras/stw1772 is OK
- 10.1103/PhysRevLett.119.161101 is OK
- 10.1093/mnras/sty1353 is OK
- 10.1111/j.1365-2966.2009.15514.x is OK

MISSING DOIs

- None

INVALID DOIs

- None