pyOpenSci / software-submission

Submit your package for review by pyOpenSci here! If you have questions please post them here: https://pyopensci.discourse.group/
93 stars 36 forks source link

afscgap: Ocean health tools for NOAA AFSC GAP species presence data #93

Closed sampottinger closed 1 year ago

sampottinger commented 1 year ago

Submitting Author: Sam Pottinger (@sampottinger)
All current maintainers: @sampottinger, @gizarp
Package Name: afscgap
One-Line Description of Package: Community contributed Python-based tools for working with public bottom trawl surveys data from the NOAA Alaska Fisheries Science Center Groundfish Assessment Program (NOAA AFSC GAP).
Repository Link: https://github.com/SchmidtDSE/afscgap
Version submitted: 0.0.7
Editor: @ocefpaf Reviewer 1: @7yl4r Reviewer 2: @ayushanand18 Archive: DOI JOSS DOI: DOI Version accepted: 0.0.7 Date accepted (month/day/year): 05-30-2023


Code of Conduct & Commitment to Maintain Package

Description

Python-based tool set for interacting with bottom trawl surveys from the Ground Fish Assessment Program (GAP). This provides information about where certain species were seen and when under what conditions, information useful for research in ocean health.

It offers:

Note that GAP is an excellent dataset produced by the Resource Assessment and Conservation Engineering (RACE) Division of the Alaska Fisheries Science Center (AFSC) as part of the National Oceanic and Atmospheric Administration's Fisheries organization (NOAA Fisheries). See also the RACEBASE NOAA InPort entry.

Additional information at https://pyafscgap.org/.

Scope

related to data viz category: see presubmission Domain Specific & Community Partnerships

Community Partnerships

If your package is associated with an existing community please check below:

[^1]: Please fill out a pre-submission inquiry before submitting a data visualization package.

This library supports retrieval of data from the official NOAA AFSC GAP REST API service but, as that service on its own is often not sufficient due to certain data limitations, it also offers zero catch inference as required for a number of common types of investigations (hence data processing). Finally, given the needs of the community and the vast breadth of the dataset, it offers a community application to explore the data which, in turn, can generate Python code to help users get started with continued analysis within their own scripts.

This project largely benefits scientific researchers in the ocean health space as this dataset is useful for fisheries management, biodiversity research, and marine science more generally. An example of what this analysis may look like is provided in our example notebook hosted on mybinder.org.

We are not aware of other Python packages working with the AFSC GAP dataset.

Please see our presubmission.

Technical checks

For details about the pyOpenSci packaging requirements, see our packaging guide. Confirm each of the following by checking the box. This package:

See NOAA InPort entry. We are primarily using Distribution 6 in the API form. As described, these data are available with the following access constraints: "There are no legal restrictions on access to the data. They reside in public domain and can be freely distributed."

See BSD License within repo.

See project README.md.

See project website and usage section of project website's documentation microsite.

See tutorial notebook on mybinder.

See main tests for package and supplemental tests for website (backend tests and frontend tests).

See CI and CD (library, documentation).

Publication Options

JOSS Checks

- [X] The package has an **obvious research application** according to JOSS's definition in their [submission requirements][JossSubmissionRequirements]. Be aware that completing the pyOpenSci review process **does not** guarantee acceptance to JOSS. Be sure to read their submission requirements (linked above) if you are interested in submitting to JOSS. See [example notebook](https://mybinder.org/v2/gh/SchmidtDSE/afscgap/main?urlpath=/tree/index.ipynb) and [app intro](https://app.pyafscgap.org/). - [X] The package is not a "minor utility" as defined by JOSS's [submission requirements][JossSubmissionRequirements]: "Minor ‘utility’ packages, including ‘thin’ API clients, are not acceptable." pyOpenSci welcomes these packages under "Data Retrieval", but JOSS has slightly different criteria. In addition to query compilation / emulation, we are hopeful that the negative record inference and complementing visualization tool are enough to escape the thin API client status. Coming in at over 10k lines of code, we would seem to fit some of the ["hard" criteria that the journal puts forward](https://joss.readthedocs.io/en/latest/submitting.html). Though this library happily provides Pythonic access to an API service, we believe that this contribution as a whole goes beyond a thin client by providing one of the only public mechanisms for conducting investigation requiring negative catch data and provides unique tools for comparative analysis within with the dataset. - [X] The package contains a `paper.md` matching [JOSS's requirements][JossPaperRequirements] with a high-level description in the package root or in `inst/`. See https://github.com/SchmidtDSE/afscgap/blob/main/inst/paper.md (or PDF preview at https://github.com/SchmidtDSE/afscgap/blob/main/inst/paper.pdf). - [X] The package is deposited in a long-term repository with the DOI: We have submitted to Code Ocean. Please confirm if this will suffice. See 10.24433/CO.4905407.v1 / https://codeocean.com/capsule/4905407/tree/v1 *Note: Do not submit your package separately to JOSS*

Are you OK with Reviewers Submitting Issues and/or pull requests to your Repo Directly?

This option will allow reviewers to open smaller issues that can then be linked to PR's rather than submitting a more dense text based review. It will also allow you to demonstrate addressing the issue via PR links.

We would be delighted to have your pull requests! 🎉

Confirm each of the following by checking the box.

Please fill out our survey

P.S. Have feedback/comments about our review process? Leave a comment here

Editor and Review Templates

The [editor template can be found here][Editor Template].

The [review template can be found here][Review Template].

sampottinger commented 1 year ago

This is a continuation of #91

sampottinger commented 1 year ago

Sorry updated issue body to fix a small formatting issue and to clarify that we have two maintainers.

NickleDave commented 1 year ago

No worries, edit as much as you need -- we will end up updating the metadata in the YAML header as we go anyway.

I will do initial checks and start finding an editor for this review early next week.

sampottinger commented 1 year ago

Fantastic thank you! Have a great weekend :)

NickleDave commented 1 year ago

You too! 🙂

NickleDave commented 1 year ago

Hi again @sampottinger I'm adding the initial editor checks below.

Good news! afscgap passes with flying colors. I made a couple of comments in the checklist but I don't see anything that needs to be addressed--looks like we're ready for review.

I will begin looking for an editor this week and update you here as soon as we have found one.

Editor in Chief checks

Hi there! Thank you for submitting your package for pyOpenSci review. Below are the basic checks that your package needs to pass to begin our review. If some of these are missing, we will ask you to work on them before the review process begins.

Please check our Python packaging guide for more information on the elements below.



Editor comments

sampottinger commented 1 year ago

Incredible @NickleDave thank you so much!

NickleDave commented 1 year ago

Hi @sampottinger and @gizarp, very happy to say that @ocefpaf has generously volunteered to act as editor for this review. He will introduce himself and take the lead from here.

sampottinger commented 1 year ago

Thanks @NickleDave and great to meet you @ocefpaf!

ocefpaf commented 1 year ago

Hi folks! Sorry for the delay in getting here. (Holiday+longweekend in South America this past few days.)

@NickleDave I'm catching up with the reading today. Please let me know if there are any urgent button/forms I need press/fill.

ocefpaf commented 1 year ago

@sampottinger I have a few really minor comments after a first pass. These are not blockers and you don't need to address them for this application! They are really just comments:

  1. I see you have a module named utils. That can be problematic as we tend to put everything in there. It also make future refactors or even maintaining the code a bit hard. Most of the functions there seems to be "converters," right? Maybe renamed it to converters.py (horrible name, I know) and avoid the "utils.py" trap.
  2. The query method argument list is daunting! I'm not sure if there is a way around it. I know you are limited by the data structure upstream. Maybe it is a matter of creating more examples and docs? Or other methods with a reduced search scope that calls the main one? Not sure if that even makes sense.
  3. The package name suffers from the usual NOAA acronym problem. I understood it but I am "in the know" b/c I'm a NOS/IOOS/SECOORA contractor (see ;-p) . Maybe a friendlier name can help folks outside of NOAA-Fisheries find this package. However, if there is a user base, even a small one, please disregard this comment. Renaming a package after folks adopted it is worse than a bad name.

PS: I showed your package to a few colleagues at IOOS and they believe that a notebook with an afscgap example would make a nice addition to our Code Lab https://ioos.github.io/ioos_code_lab/content/intro.html. Let me know if you are interested in that.

sampottinger commented 1 year ago

Hey @ocefpaf! Thanks so much for all of these. Let me pull together a PR real fast and some comments!

sampottinger commented 1 year ago

Thanks again @ocefpaf for your thoughtful comments! Please see below.

I see you have a module named utils

This is a good suggestion. I made a PR at https://github.com/SchmidtDSE/afscgap/pull/49. Let me know if this refactor looks good to you!

The package name suffers from the usual NOAA acronym problem ... if there is a user base, even a small one, please disregard this comment

Yeah the dataset's whole name is NOAA AFSC GAP RACEBACSE FOSS :sweat_smile: . Thanks for being understanding on this one. I think we will politely decline this suggestion if that's OK. We have some community continuity concerns like you mentioned and some community sensitivity concerns we want to respect... we have to be a little careful around not appearing to have activity in other datasets (NOAA or otherwise) that may be at different moments in their open science journey than AFSC GAP or not (yet? :crossed_fingers:) open in the same way that the AFSC GAP API service is. Given our current context, this specificity is probably desirable for at least the medium-term future of the project.

a nice addition to our Code Lab

Amazing! We'd be honored :smile:. We'd be more than happy to co-develop something with you or another community member in that space. Just for the conversation, I'll highlight that we do have a mybinder example available (source in repo). Totally understand if it might not be quite the right vibe though.

The query method argument list is daunting!

I think that design choice is a strong opinion loosely held. I'm not sure if it is helpful but I'll document some thoughts we had internally but 100% open to additional ideas!

We went this route to fulfill a few goals: we wanted to reflect the structure of the API service (any combination of filters in the keyword set is valid in the NOAA API service so there isn't a clear functional decomposition) while also providing documented discoverability. There were some other ideas explored:

Just wanted to list this out but absolutely open to other thoughts! This just felt like the right balance around the broader goals with regards to documentation / discoverability, keeping the type system encapsulated (see below), and maintaining accessibility.

library

ocefpaf commented 1 year ago

This is a good suggestion. I made a PR at https://github.com/SchmidtDSE/afscgap/pull/49. Let me know if this refactor looks good to you!

Better than I thought. Hopefully that will make your life easier in the future.

I think we will politely decline this suggestion if that's OK.

No arguments here! It is better to keep the name now that the package is out.

Totally understand if it might not be quite the right vibe though.

It is a good fit. I'll ping you outside of pyOpenSci later to talk more about it.

Regarding the query functionality. I do like the idea of the builder chaining and it is not too uncommon. Pandas, pyjanitor and many other mainstream libraries use it. It is quite common in the R world too. Also, many similar packages, like argopy, have something like that.

With that said, again, anything I write here is not a blocker for acceptance. The current implementation is good and solid. Just suggestions for the future.

PS: I disagree with the learning curve there. Maybe it is my bais but I teach a lot of basic Python and I make sure to expose the students to a flat vs nested model for operations.

ocefpaf commented 1 year ago

:wave: Hi Tylar Murray and Ayush Anand! Thank you for volunteering to review for pyOpenSci!

The following resources will help you complete your review:

  1. Here is the reviewers guide. This guide contains all of the steps and information needed to complete your review.
  2. Here is the review template that you will need to fill out and submit here as a comment, once your review is complete.

Please get in touch with any questions or concerns! Your review is due: May 4th, 2023


Reviewers: @7yl4r and @ayushanand18 Due date: 2023-05-04

PS: You can comment, open issue, and/or PR directly in the afscgap for this review. Just make sure to cross-reference them here.

sampottinger commented 1 year ago

Thanks @ocefpaf!

Better than I thought.

Awesome! Merged in.

I disagree with the learning curve there.

That's helpful to have your perspective on that. Let me float this around a little bit and see how we feel about migration. I'll try to make a call on that in the next day or so to keep things moving.

With that said, again, anything I write here is not a blocker for acceptance. The current implementation is good and solid. Just suggestions for the future.

Thanks!

wave Hi Tylar Murray and Ayush Anand!

Lovely to meet you both! Thank you for your time. :D

ayushanand18 commented 1 year ago

Thank you so much @ocefpaf and @sampottinger! Great work building this package. The current implementation looks robust and clean.

But I do have some observations to make.

  1. There is this function afscgap.query().get_bottom_temperature_c(), and I do think that it might be hard to guess the function name for someone who hasn't seen this dataset before. Mind renaming to something like get_bottom_temperature(unit="celsius"|'c')? This observation is similar to other functions like cpue_*(), duration_hr, etc. I think could have been derived from the direct field names in the dataset, and those who know about the dataset can easily figure out what the name would be, without looking at the docs, but it would be difficult for people from outside (like me) to figure out without going back-and-forth to the docs.

  2. Also about the filters and getters, we can utilise the getitem() method in the class with something like this,

    def __getitem__(self, key):
    return self.__getattribute__(key) 

    This might be more intuitive. Or to make it much easier can we can just go by the name get(key), and specify the key we want. Ref

I would love to hear your opinion, and these are surely non-blockers for this review!

sampottinger commented 1 year ago

Hello @ayushanand18 and @ocefpaf!

I do like the idea of the builder chaining and it is not too uncommon

Thanks again for the conversation @ocefpaf! I've started implementing the builder pattern at https://github.com/SchmidtDSE/afscgap/issues/50.

Mind renaming to something like get_bottom_temperature(unit="celsius"|'c')

Thanks for the suggestion @ayushanand18! I've started that at https://github.com/SchmidtDSE/afscgap/issues/51

we can utilise the getitem() method in the class

Thanks for this suggestion but I may pass on this if that's ok @ayushanand18.

In light of the stuff we did earlier in process, our reasoning is that the library carries an explicit goal to provide docstrings and type annotations for these fields. I think, given the complexity of the data model itself as well as the difficulty in using some of the upstream documentation from NOAA directly in the Python ecosystem, I want to leave those methods there for discoverability and encourage use of those typed interfaces unless the user "opts out" first.

To that end, there is a to_dict method on afscgap.model.Record for those that prefer that key-based approach or want to read into something like pandas.DataFrame but we hope requiring to_dict first helps inform the user that the object returned from a query does offer those other interfaces, information which may be missed if they simply try treating the result as a dictionary. In other words, I'd rather have the user first explicitly decline use of the default typed / strong interface given the benefits it offers for static checking and affordance.

sampottinger commented 1 year ago

Hello all! @ocefpaf, the builder interface was implemented in https://github.com/SchmidtDSE/afscgap/pull/52.

sampottinger commented 1 year ago

Mind renaming to something like get_bottom_temperature(unit="celsius"|'c')

Thanks @ayushanand18! I have implemented in https://github.com/SchmidtDSE/afscgap/pull/55

sampottinger commented 1 year ago

I am in the process of confirming https://github.com/SchmidtDSE/afscgap/pull/54 which includes the following requested changes before they are resolved on main:

https://github.com/SchmidtDSE/afscgap/pull/54 will also soon include updates to documentation.

ayushanand18 commented 1 year ago

I've started that at SchmidtDSE/afscgap#51

That looks great!

Thanks for this suggestion but I may pass on this if that's ok .

It's okay. At the end of the day, we need to follow what's best for the user of the package.

To that end, there is a to_dict method on afscgap.model.Record for those that prefer that key-based approach or want to read into something like pandas.DataFrame [...]

Sounds good. But that leaves me a question of whether the afsc.model.Record object returns a nested dictionary in any case with the to_dict() method. If the answer is yes, then probably we might include a function to_pandas() that unnests the nested dictionary and returns a DataFrame object so that user doesn't take the pain. But again this is just a suggestion.

ocefpaf commented 1 year ago

given the complexity of the data model itself as well as the difficulty in using some of the upstream documentation from NOAA directly in the Python ecosystem

I hear you. My day job is to try to make that better and, most of the times, we fail :-/

sampottinger commented 1 year ago

I hear you. My day job is to try to make that better and, most of the times, we fail :-/

@ocefpaf - I'm sorry I didn't mean it like that! Perhaps just that the docstrings are extra convenient in this context :sweat_smile: . The upstream documentation for the datasets have to address both internal and external users with different kinds of access permissions / modalities. It just means it's a big schema with lots of (great) data but also takes a little while to get oriented towards. I think / hope that the docstrings help a bit but this library also has a simpler task since it knows which dataset the user is working with. Also, since I know they are in Python, I can also do a few things to help the user that the upstream service can't do.

Unrelated but thanks for your comment on the issue in the repo about the builder pattern!

That looks great!

Thank you! Appreciate your suggestions @ayushanand18.

object returns a nested dictionary in any case with the to_dict() method

It is not nested! It uses the same field names and linear record structure from the API service.

sampottinger commented 1 year ago

Hello all! Your requests are implemented on main and the issues in the library are closed. Thanks for your help!

ocefpaf commented 1 year ago

@ocefpaf - I'm sorry I didn't mean it like that!

No harm done! I was mostly venting my own day-job frustration. Like in Software, agreeing on standards is hard!

Most of the time, even when we do agree on them, we only see its flaws when we start to implement Software to use them. By that time it is too late and now we need to circumvent the issues with Software and/or metadata patches.

ocefpaf commented 1 year ago

@ayushanand18 thanks for all the feedback!

@7yl4r do you believe you can take a look at this project in the next few days? If not, please let us know!

ocefpaf commented 1 year ago

PS: Don't forget to close your review with the reviewer template. See https://github.com/pyOpenSci/software-submission/issues/93#issuecomment-1507160030.

ayushanand18 commented 1 year ago

@ayushanand18 thanks for all the feedback!

Thanks @ocefpaf, I'm done with my comments on the review.

ayushanand18 commented 1 year ago

Package Review

Please check off boxes as applicable, and elaborate in comments below. Your review is not limited to these topics, as described in the reviewer guide

Documentation

The package includes all the following forms of documentation:

Readme file requirements The package meets the readme requirements below:

The README should include, from top to bottom:

NOTE: If the README has many more badges, you might want to consider using a table for badges: see this example. Such a table should be more wide than high. (Note that the a badge for pyOpenSci peer-review will be provided upon acceptance.)

Usability

Reviewers are encouraged to submit suggestions (or pull requests) that will improve the usability of the package as a whole. Package structure should follow general community best-practices. In general please consider whether:

Functionality

For packages also submitting to JOSS

Note: Be sure to check this carefully, as JOSS's submission requirements and scope differ from pyOpenSci's in terms of what types of packages are accepted.

The package contains a paper.md matching JOSS's requirements with:

Final approval (post-review)

Estimated hours spent reviewing: 0.5


Review Comments

Thanks for the submission @sampottinger! Everything looks good. ~However, my only concern is of including Installation Instructions for the dev version of the package through GitHub.~

sampottinger commented 1 year ago

Hello @ayushanand18! Thanks very much for your review.

However, my only concern is of including Installation Instructions for the dev version of the package through GitHub.

I’m sorry are you recommending that we include additional installation details for installing direct from GitHub (eg: pip install git+https://github.com/SchmidtDSE/afscgap.git@main)? Just wanted to double check.

sampottinger commented 1 year ago

Also @ocefpaf / @ayushanand18 - we were hoping to try at JOSS (we know they don’t like API wrappers but we are hoping that zero catch inference, ORDS compilation / emulation, visualization, etc take us over the line). The paper.md is in the inst folder. LMK if there is something else we should do to facilitate. Just noticed that the section on JOSS was left unchecked.

sampottinger commented 1 year ago

Sorry @ayushanand18 - Added this PR to address your final comment: https://github.com/SchmidtDSE/afscgap/pull/72. LMK if this addresses your concern. Thanks again very much for your time! We very much appreciate all of your help.

ocefpaf commented 1 year ago

LMK if there is something else we should do to facilitate. Just noticed that the section on JOSS was left unchecked.

@lwasser should the PyOS reviewers fill that part or do you get someone from JOSS doing that?

lwasser commented 1 year ago

@ocefpaf this is a great question. JOSS does not normally check those boxes. I think typically the submittor / author would want to check those to indicate that they have all of the items needed by JOSS. but JOSS will review the paper - we do not need to review the paper. It is my understanding that JOSS will also double check scope. But it is good for the author to check first. I hope this helps. feel free to ask more questions if it's unclear as we can always adjust our processes if we need to.

@sampottinger thank you for submitting to us and @ayushanand18 @7yl4r so happy to have you here as reviewers! 👋

sampottinger commented 1 year ago

Thank you @lwasser !

ayushanand18 commented 1 year ago

I’m sorry

No need to be sorry on that, I should have been clearer in my comments :)

are you recommending that we include additional installation details for installing direct from GitHub

Yes, I meant that. The PR looks good, and I don't have anything to add.

sampottinger commented 1 year ago

Thanks @ayushanand18 !

sampottinger commented 1 year ago

Hello all! Just checking in. Is there anything else I can do to help in the process? Thanks!

7yl4r commented 1 year ago

Sorry for the delay on my part. I have filled the review template in this gist but am still looking through things. All looks great. There are some specific details from pyOpenSci to look at. My personal preference would be to simplify the documentation text, but I do not yet have a recommendation on how to do that.

sampottinger commented 1 year ago

Thanks! Looking forward to your feedback @7yl4r!

sampottinger commented 1 year ago

Hello @7yl4r! Thank you for https://github.com/SchmidtDSE/afscgap/issues/75. I have responded there but I was unable to reproduce your behavior in a fresh Ubuntu VM. Just checking in... is there anything else I can help with at this time? Thank you!

ocefpaf commented 1 year ago

There are some specific details from pyOpenSci to look at. My personal preference would be to simplify the documentation text, but I do not yet have a recommendation on how to do that.

Please feel free to ping me directly or open issues in the GH repo for the docs. Documentation improvements are always very welcomed :-)

ocefpaf commented 1 year ago

Sorry for the delay on my part. I have filled the review template in this gist but am still looking through things.

When you are done please paste them as a comment here. @lwasser will aggregate the reviews later and having them in the original issue helps with the process.

sampottinger commented 1 year ago

I have addressed and closed https://github.com/SchmidtDSE/afscgap/issues/75. Thank you for your help!

lwasser commented 1 year ago

thank you all for being here and for using the templates!! @ocefpaf @7yl4r it is SOOOO very helpful to us to have consistent templates for many reasons. so i truly appreciate your doing this and pasting them here. you can still add any adhoc review text that you wish to add to the template. ✨

sampottinger commented 1 year ago

Thanks everyone! I don't believe there are any tasks / unresolved feedback pending on our end right now. Please let me know if there is anything else we can do. 🥳

7yl4r commented 1 year ago

Package Review

Documentation

The package includes all the following forms of documentation:

Readme file requirements The package meets the readme requirements below:

The README should include, from top to bottom:

Usability

Reviewers are encouraged to submit suggestions (or pull requests) that will improve the usability of the package as a whole. Package structure should follow general community best-practices. In general please consider whether:

Functionality

For packages also submitting to JOSS

Note: Be sure to check this carefully, as JOSS's submission requirements and scope differ from pyOpenSci's in terms of what types of packages are accepted.

The package contains a paper.md matching JOSS's requirements with:

Final approval (post-review)

Estimated hours spent reviewing: 2


Review Comments

Just a few general comments on my experince digging into this package:

ocefpaf commented 1 year ago

@sampottinger I move to the next stage, "awaiting reviewers changes" but I do believe you implement all of the suggestions already, right? @7yl4r and @ayushanand18 is there anything pending on your side?

7yl4r commented 1 year ago

Everything looks good on my side. I can tell a lot of great work has gone into this project and it is in good shape.