Closed sampottinger closed 1 year ago
This is a continuation of #91
Sorry updated issue body to fix a small formatting issue and to clarify that we have two maintainers.
No worries, edit as much as you need -- we will end up updating the metadata in the YAML header as we go anyway.
I will do initial checks and start finding an editor for this review early next week.
Fantastic thank you! Have a great weekend :)
You too! 🙂
Hi again @sampottinger I'm adding the initial editor checks below.
Good news! afscgap passes with flying colors. I made a couple of comments in the checklist but I don't see anything that needs to be addressed--looks like we're ready for review.
I will begin looking for an editor this week and update you here as soon as we have found one.
Hi there! Thank you for submitting your package for pyOpenSci review. Below are the basic checks that your package needs to pass to begin our review. If some of these are missing, we will ask you to work on them before the review process begins.
Please check our Python packaging guide for more information on the elements below.
import package-name
.README.md
file with clear explanation of what the package does, instructions on how to install it, and a link to development instructions.CONTRIBUTING.md
file that details how to install and contribute to the package.Code of Conduct
file.YAML
header of the issue (located at the top of the issue template).Incredible @NickleDave thank you so much!
Hi @sampottinger and @gizarp, very happy to say that @ocefpaf has generously volunteered to act as editor for this review. He will introduce himself and take the lead from here.
Thanks @NickleDave and great to meet you @ocefpaf!
Hi folks! Sorry for the delay in getting here. (Holiday+longweekend in South America this past few days.)
@NickleDave I'm catching up with the reading today. Please let me know if there are any urgent button/forms I need press/fill.
@sampottinger I have a few really minor comments after a first pass. These are not blockers and you don't need to address them for this application! They are really just comments:
converters.py
(horrible name, I know) and avoid the "utils.py" trap.query
method argument list is daunting! I'm not sure if there is a way around it. I know you are limited by the data structure upstream. Maybe it is a matter of creating more examples and docs? Or other methods with a reduced search scope that calls the main one? Not sure if that even makes sense.PS: I showed your package to a few colleagues at IOOS and they believe that a notebook with an afscgap example would make a nice addition to our Code Lab https://ioos.github.io/ioos_code_lab/content/intro.html. Let me know if you are interested in that.
Hey @ocefpaf! Thanks so much for all of these. Let me pull together a PR real fast and some comments!
Thanks again @ocefpaf for your thoughtful comments! Please see below.
I see you have a module named utils
This is a good suggestion. I made a PR at https://github.com/SchmidtDSE/afscgap/pull/49. Let me know if this refactor looks good to you!
The package name suffers from the usual NOAA acronym problem ... if there is a user base, even a small one, please disregard this comment
Yeah the dataset's whole name is NOAA AFSC GAP RACEBACSE FOSS :sweat_smile: . Thanks for being understanding on this one. I think we will politely decline this suggestion if that's OK. We have some community continuity concerns like you mentioned and some community sensitivity concerns we want to respect... we have to be a little careful around not appearing to have activity in other datasets (NOAA or otherwise) that may be at different moments in their open science journey than AFSC GAP or not (yet? :crossed_fingers:) open in the same way that the AFSC GAP API service is. Given our current context, this specificity is probably desirable for at least the medium-term future of the project.
a nice addition to our Code Lab
Amazing! We'd be honored :smile:. We'd be more than happy to co-develop something with you or another community member in that space. Just for the conversation, I'll highlight that we do have a mybinder example available (source in repo). Totally understand if it might not be quite the right vibe though.
The query method argument list is daunting!
I think that design choice is a strong opinion loosely held. I'm not sure if it is helpful but I'll document some thoughts we had internally but 100% open to additional ideas!
We went this route to fulfill a few goals: we wanted to reflect the structure of the API service (any combination of filters in the keyword set is valid in the NOAA API service so there isn't a clear functional decomposition) while also providing documented discoverability. There were some other ideas explored:
query(filters=[CommonNameFilter(eq='Pacific cod')]
but we didn't like forcing a lot of our type system into client code as it would require quite a large number of types to maintain our goals around docstring coverage of the available filter parameters and discoverability.query().commonName(eq='Pacific cod').year(min=2021).get()
. While I personally like this syntax, there's been conversation internally and with community partners for using this library in an educational context as well. As this syntax may be less common in earlier Python programming journeys, there was a feeling that the keyword arguments approach was more inclusive / approachable for someone in, say, an introductory Python course or a student not in a computer science curriculum (we got invited for one coming up soon!).Just wanted to list this out but absolutely open to other thoughts! This just felt like the right balance around the broader goals with regards to documentation / discoverability, keeping the type system encapsulated (see below), and maintaining accessibility.
This is a good suggestion. I made a PR at https://github.com/SchmidtDSE/afscgap/pull/49. Let me know if this refactor looks good to you!
Better than I thought. Hopefully that will make your life easier in the future.
I think we will politely decline this suggestion if that's OK.
No arguments here! It is better to keep the name now that the package is out.
Totally understand if it might not be quite the right vibe though.
It is a good fit. I'll ping you outside of pyOpenSci later to talk more about it.
Regarding the query functionality. I do like the idea of the builder chaining and it is not too uncommon. Pandas, pyjanitor and many other mainstream libraries use it. It is quite common in the R world too. Also, many similar packages, like argopy, have something like that.
With that said, again, anything I write here is not a blocker for acceptance. The current implementation is good and solid. Just suggestions for the future.
PS: I disagree with the learning curve there. Maybe it is my bais but I teach a lot of basic Python and I make sure to expose the students to a flat vs nested model for operations.
:wave: Hi Tylar Murray and Ayush Anand! Thank you for volunteering to review
for pyOpenSci!
The following resources will help you complete your review:
Please get in touch with any questions or concerns! Your review is due: May 4th, 2023
Reviewers: @7yl4r and @ayushanand18 Due date: 2023-05-04
PS: You can comment, open issue, and/or PR directly in the afscgap for this review. Just make sure to cross-reference them here.
Thanks @ocefpaf!
Better than I thought.
Awesome! Merged in.
I disagree with the learning curve there.
That's helpful to have your perspective on that. Let me float this around a little bit and see how we feel about migration. I'll try to make a call on that in the next day or so to keep things moving.
With that said, again, anything I write here is not a blocker for acceptance. The current implementation is good and solid. Just suggestions for the future.
Thanks!
wave Hi Tylar Murray and Ayush Anand!
Lovely to meet you both! Thank you for your time. :D
Thank you so much @ocefpaf and @sampottinger! Great work building this package. The current implementation looks robust and clean.
But I do have some observations to make.
There is this function afscgap.query().get_bottom_temperature_c()
, and I do think that it might be hard to guess the function name for someone who hasn't seen this dataset before. Mind renaming to something like get_bottom_temperature(unit="celsius"|'c')
? This observation is similar to other functions like cpue_*()
, duration_hr
, etc. I think could have been derived from the direct field names in the dataset, and those who know about the dataset can easily figure out what the name would be, without looking at the docs, but it would be difficult for people from outside (like me) to figure out without going back-and-forth to the docs.
Also about the filters and getters, we can utilise the getitem() method in the class with something like this,
def __getitem__(self, key):
return self.__getattribute__(key)
This might be more intuitive. Or to make it much easier can we can just go by the name get(key)
, and specify the key we want. Ref
I would love to hear your opinion, and these are surely non-blockers for this review!
Hello @ayushanand18 and @ocefpaf!
I do like the idea of the builder chaining and it is not too uncommon
Thanks again for the conversation @ocefpaf! I've started implementing the builder pattern at https://github.com/SchmidtDSE/afscgap/issues/50.
Mind renaming to something like
get_bottom_temperature(unit="celsius"|'c')
Thanks for the suggestion @ayushanand18! I've started that at https://github.com/SchmidtDSE/afscgap/issues/51
we can utilise the getitem() method in the class
Thanks for this suggestion but I may pass on this if that's ok @ayushanand18.
In light of the stuff we did earlier in process, our reasoning is that the library carries an explicit goal to provide docstrings and type annotations for these fields. I think, given the complexity of the data model itself as well as the difficulty in using some of the upstream documentation from NOAA directly in the Python ecosystem, I want to leave those methods there for discoverability and encourage use of those typed interfaces unless the user "opts out" first.
To that end, there is a to_dict
method on afscgap.model.Record
for those that prefer that key-based approach or want to read into something like pandas.DataFrame
but we hope requiring to_dict
first helps inform the user that the object returned from a query does offer those other interfaces, information which may be missed if they simply try treating the result as a dictionary. In other words, I'd rather have the user first explicitly decline use of the default typed / strong interface given the benefits it offers for static checking and affordance.
Hello all! @ocefpaf, the builder interface was implemented in https://github.com/SchmidtDSE/afscgap/pull/52.
Mind renaming to something like get_bottom_temperature(unit="celsius"|'c')
Thanks @ayushanand18! I have implemented in https://github.com/SchmidtDSE/afscgap/pull/55
I am in the process of confirming https://github.com/SchmidtDSE/afscgap/pull/54 which includes the following requested changes before they are resolved on main:
https://github.com/SchmidtDSE/afscgap/pull/54 will also soon include updates to documentation.
I've started that at SchmidtDSE/afscgap#51
That looks great!
Thanks for this suggestion but I may pass on this if that's ok .
It's okay. At the end of the day, we need to follow what's best for the user of the package.
To that end, there is a
to_dict
method onafscgap.model.Record
for those that prefer that key-based approach or want to read into something likepandas.DataFrame
[...]
Sounds good. But that leaves me a question of whether the afsc.model.Record
object returns a nested dictionary in any case with the to_dict()
method. If the answer is yes, then probably we might include a function to_pandas()
that unnests the nested dictionary and returns a DataFrame
object so that user doesn't take the pain. But again this is just a suggestion.
given the complexity of the data model itself as well as the difficulty in using some of the upstream documentation from NOAA directly in the Python ecosystem
I hear you. My day job is to try to make that better and, most of the times, we fail :-/
I hear you. My day job is to try to make that better and, most of the times, we fail :-/
@ocefpaf - I'm sorry I didn't mean it like that! Perhaps just that the docstrings are extra convenient in this context :sweat_smile: . The upstream documentation for the datasets have to address both internal and external users with different kinds of access permissions / modalities. It just means it's a big schema with lots of (great) data but also takes a little while to get oriented towards. I think / hope that the docstrings help a bit but this library also has a simpler task since it knows which dataset the user is working with. Also, since I know they are in Python, I can also do a few things to help the user that the upstream service can't do.
Unrelated but thanks for your comment on the issue in the repo about the builder pattern!
That looks great!
Thank you! Appreciate your suggestions @ayushanand18.
object returns a nested dictionary in any case with the to_dict() method
It is not nested! It uses the same field names and linear record structure from the API service.
Hello all! Your requests are implemented on main and the issues in the library are closed. Thanks for your help!
@ocefpaf - I'm sorry I didn't mean it like that!
No harm done! I was mostly venting my own day-job frustration. Like in Software, agreeing on standards is hard!
Most of the time, even when we do agree on them, we only see its flaws when we start to implement Software to use them. By that time it is too late and now we need to circumvent the issues with Software and/or metadata patches.
@ayushanand18 thanks for all the feedback!
@7yl4r do you believe you can take a look at this project in the next few days? If not, please let us know!
PS: Don't forget to close your review with the reviewer template. See https://github.com/pyOpenSci/software-submission/issues/93#issuecomment-1507160030.
@ayushanand18 thanks for all the feedback!
Thanks @ocefpaf, I'm done with my comments on the review.
Please check off boxes as applicable, and elaborate in comments below. Your review is not limited to these topics, as described in the reviewer guide
The package includes all the following forms of documentation:
pyproject.toml
file or elsewhere.Readme file requirements The package meets the readme requirements below:
The README should include, from top to bottom:
NOTE: If the README has many more badges, you might want to consider using a table for badges: see this example. Such a table should be more wide than high. (Note that the a badge for pyOpenSci peer-review will be provided upon acceptance.)
Reviewers are encouraged to submit suggestions (or pull requests) that will improve the usability of the package as a whole. Package structure should follow general community best-practices. In general please consider whether:
Note: Be sure to check this carefully, as JOSS's submission requirements and scope differ from pyOpenSci's in terms of what types of packages are accepted.
The package contains a paper.md
matching JOSS's requirements with:
Estimated hours spent reviewing: 0.5
Thanks for the submission @sampottinger! Everything looks good. ~However, my only concern is of including Installation Instructions for the dev
version of the package through GitHub.~
Hello @ayushanand18! Thanks very much for your review.
However, my only concern is of including Installation Instructions for the dev version of the package through GitHub.
I’m sorry are you recommending that we include additional installation details for installing direct from GitHub (eg: pip install git+https://github.com/SchmidtDSE/afscgap.git@main
)? Just wanted to double check.
Also @ocefpaf / @ayushanand18 - we were hoping to try at JOSS (we know they don’t like API wrappers but we are hoping that zero catch inference, ORDS compilation / emulation, visualization, etc take us over the line). The paper.md
is in the inst
folder. LMK if there is something else we should do to facilitate. Just noticed that the section on JOSS was left unchecked.
Sorry @ayushanand18 - Added this PR to address your final comment: https://github.com/SchmidtDSE/afscgap/pull/72. LMK if this addresses your concern. Thanks again very much for your time! We very much appreciate all of your help.
LMK if there is something else we should do to facilitate. Just noticed that the section on JOSS was left unchecked.
@lwasser should the PyOS reviewers fill that part or do you get someone from JOSS doing that?
@ocefpaf this is a great question. JOSS does not normally check those boxes. I think typically the submittor / author would want to check those to indicate that they have all of the items needed by JOSS. but JOSS will review the paper - we do not need to review the paper. It is my understanding that JOSS will also double check scope. But it is good for the author to check first. I hope this helps. feel free to ask more questions if it's unclear as we can always adjust our processes if we need to.
@sampottinger thank you for submitting to us and @ayushanand18 @7yl4r so happy to have you here as reviewers! 👋
Thank you @lwasser !
I’m sorry
No need to be sorry on that, I should have been clearer in my comments :)
are you recommending that we include additional installation details for installing direct from GitHub
Yes, I meant that. The PR looks good, and I don't have anything to add.
Thanks @ayushanand18 !
Hello all! Just checking in. Is there anything else I can do to help in the process? Thanks!
Sorry for the delay on my part. I have filled the review template in this gist but am still looking through things. All looks great. There are some specific details from pyOpenSci to look at. My personal preference would be to simplify the documentation text, but I do not yet have a recommendation on how to do that.
Thanks! Looking forward to your feedback @7yl4r!
Hello @7yl4r! Thank you for https://github.com/SchmidtDSE/afscgap/issues/75. I have responded there but I was unable to reproduce your behavior in a fresh Ubuntu VM. Just checking in... is there anything else I can help with at this time? Thank you!
There are some specific details from pyOpenSci to look at. My personal preference would be to simplify the documentation text, but I do not yet have a recommendation on how to do that.
Please feel free to ping me directly or open issues in the GH repo for the docs. Documentation improvements are always very welcomed :-)
Sorry for the delay on my part. I have filled the review template in this gist but am still looking through things.
When you are done please paste them as a comment here. @lwasser will aggregate the reviews later and having them in the original issue helps with the process.
I have addressed and closed https://github.com/SchmidtDSE/afscgap/issues/75. Thank you for your help!
thank you all for being here and for using the templates!! @ocefpaf @7yl4r it is SOOOO very helpful to us to have consistent templates for many reasons. so i truly appreciate your doing this and pasting them here. you can still add any adhoc review text that you wish to add to the template. ✨
Thanks everyone! I don't believe there are any tasks / unresolved feedback pending on our end right now. Please let me know if there is anything else we can do. 🥳
The package includes all the following forms of documentation:
pyproject.toml
file or elsewhere.Readme file requirements The package meets the readme requirements below:
The README should include, from top to bottom:
[x] The package name
[x] Badges for:
[x] Short description of package goals.
[x] Package installation instructions
[ ] Any additional setup required to use the package (authentication tokens, etc.)
[x] Descriptive links to all vignettes. If the package is small, there may only be a need for one vignette which could be placed in the README.md file.
[x] Link to your documentation website.
[ ] If applicable, how the package compares to other similar packages and/or how it relates to other packages in the scientific ecosystem.
[ ] Citation information
Reviewers are encouraged to submit suggestions (or pull requests) that will improve the usability of the package as a whole. Package structure should follow general community best-practices. In general please consider whether:
Note: Be sure to check this carefully, as JOSS's submission requirements and scope differ from pyOpenSci's in terms of what types of packages are accepted.
The package contains a paper.md
matching JOSS's requirements with:
Estimated hours spent reviewing: 2
Just a few general comments on my experince digging into this package:
@sampottinger I move to the next stage, "awaiting reviewers changes" but I do believe you implement all of the suggestions already, right? @7yl4r and @ayushanand18 is there anything pending on your side?
Everything looks good on my side. I can tell a lot of great work has gone into this project and it is in good shape.
Submitting Author: Sam Pottinger (@sampottinger)
All current maintainers: @sampottinger, @gizarp
Package Name: afscgap
One-Line Description of Package: Community contributed Python-based tools for working with public bottom trawl surveys data from the NOAA Alaska Fisheries Science Center Groundfish Assessment Program (NOAA AFSC GAP).
Repository Link: https://github.com/SchmidtDSE/afscgap
Version submitted: 0.0.7
Editor: @ocefpaf Reviewer 1: @7yl4r Reviewer 2: @ayushanand18 Archive: JOSS DOI: Version accepted: 0.0.7 Date accepted (month/day/year): 05-30-2023
Code of Conduct & Commitment to Maintain Package
Description
Python-based tool set for interacting with bottom trawl surveys from the Ground Fish Assessment Program (GAP). This provides information about where certain species were seen and when under what conditions, information useful for research in ocean health.
It offers:
Note that GAP is an excellent dataset produced by the Resource Assessment and Conservation Engineering (RACE) Division of the Alaska Fisheries Science Center (AFSC) as part of the National Oceanic and Atmospheric Administration's Fisheries organization (NOAA Fisheries). See also the RACEBASE NOAA InPort entry.
Additional information at https://pyafscgap.org/.
Scope
Please indicate which category or categories. Check out our package scope page to learn more about our scope. (If you are unsure of which category you fit, we suggest you make a pre-submission inquiry):
related to data viz category: see presubmission Domain Specific & Community Partnerships
Community Partnerships
If your package is associated with an existing community please check below:
This library supports retrieval of data from the official NOAA AFSC GAP REST API service but, as that service on its own is often not sufficient due to certain data limitations, it also offers zero catch inference as required for a number of common types of investigations (hence data processing). Finally, given the needs of the community and the vast breadth of the dataset, it offers a community application to explore the data which, in turn, can generate Python code to help users get started with continued analysis within their own scripts.
This project largely benefits scientific researchers in the ocean health space as this dataset is useful for fisheries management, biodiversity research, and marine science more generally. An example of what this analysis may look like is provided in our example notebook hosted on mybinder.org.
We are not aware of other Python packages working with the AFSC GAP dataset.
@tag
the editor you contacted:Please see our presubmission.
Technical checks
For details about the pyOpenSci packaging requirements, see our packaging guide. Confirm each of the following by checking the box. This package:
See NOAA InPort entry. We are primarily using Distribution 6 in the API form. As described, these data are available with the following access constraints: "There are no legal restrictions on access to the data. They reside in public domain and can be freely distributed."
See BSD License within repo.
See project README.md.
See project website and usage section of project website's documentation microsite.
See tutorial notebook on mybinder.
See main tests for package and supplemental tests for website (backend tests and frontend tests).
See CI and CD (library, documentation).
Publication Options
JOSS Checks
- [X] The package has an **obvious research application** according to JOSS's definition in their [submission requirements][JossSubmissionRequirements]. Be aware that completing the pyOpenSci review process **does not** guarantee acceptance to JOSS. Be sure to read their submission requirements (linked above) if you are interested in submitting to JOSS. See [example notebook](https://mybinder.org/v2/gh/SchmidtDSE/afscgap/main?urlpath=/tree/index.ipynb) and [app intro](https://app.pyafscgap.org/). - [X] The package is not a "minor utility" as defined by JOSS's [submission requirements][JossSubmissionRequirements]: "Minor ‘utility’ packages, including ‘thin’ API clients, are not acceptable." pyOpenSci welcomes these packages under "Data Retrieval", but JOSS has slightly different criteria. In addition to query compilation / emulation, we are hopeful that the negative record inference and complementing visualization tool are enough to escape the thin API client status. Coming in at over 10k lines of code, we would seem to fit some of the ["hard" criteria that the journal puts forward](https://joss.readthedocs.io/en/latest/submitting.html). Though this library happily provides Pythonic access to an API service, we believe that this contribution as a whole goes beyond a thin client by providing one of the only public mechanisms for conducting investigation requiring negative catch data and provides unique tools for comparative analysis within with the dataset. - [X] The package contains a `paper.md` matching [JOSS's requirements][JossPaperRequirements] with a high-level description in the package root or in `inst/`. See https://github.com/SchmidtDSE/afscgap/blob/main/inst/paper.md (or PDF preview at https://github.com/SchmidtDSE/afscgap/blob/main/inst/paper.pdf). - [X] The package is deposited in a long-term repository with the DOI: We have submitted to Code Ocean. Please confirm if this will suffice. See 10.24433/CO.4905407.v1 / https://codeocean.com/capsule/4905407/tree/v1 *Note: Do not submit your package separately to JOSS*
Are you OK with Reviewers Submitting Issues and/or pull requests to your Repo Directly?
This option will allow reviewers to open smaller issues that can then be linked to PR's rather than submitting a more dense text based review. It will also allow you to demonstrate addressing the issue via PR links.
We would be delighted to have your pull requests! 🎉
Confirm each of the following by checking the box.
Please fill out our survey
P.S. Have feedback/comments about our review process? Leave a comment here
Editor and Review Templates
The [editor template can be found here][Editor Template].
The [review template can be found here][Review Template].