heliophysicsPy / standards

3 stars 11 forks source link

PHEP 4: PyHC Package Tiering #31

Open jibarnum opened 4 months ago

jibarnum commented 4 months ago

This PR proposes a new process PHEP to the PyHC. PHEP 4 establishes a new tiering structure to PyHC projects, which will automatically affect PyHC packages once it goes into effect. Included herein is information on requirements for each of the new four tiers of PyHC projects (Gold, Silver, Bronze, and Bronze), as well as benefits accrued at each tier.

jibarnum commented 4 months ago

@jameswilburlewis @aburrell @rweigel @sandyfreelance @darrendezeeuw can't add you all as reviewers (I think I need to invite you to the PyHC org on GitHub first. But for your awareness and comments.

sapols commented 4 months ago

Initial thoughts/issues:

aburrell commented 4 months ago

@sapols "copper" was my suggestion, to keep with the medal terminology. It's the next medal after "bronze".

rebeccaringuette commented 3 months ago

I would like to echo Shawn's comments on this PHEP being a great step forward for PyHC. Some comments:

jibarnum commented 3 months ago

@sapols thanks for your thoughts.

Is it a typo that the first table ends with "Copper" instead of "Honorable Mention"?

No, as @aburrell pointed out, that was changed to keep with the "medal" terminology we used for the other categories.

It'd be helpful to add a hyperlink to the PyHC env to clarify which env we mean. Probably even a specific Docker image for extra clarity? Although (and maybe this is a bigger question) do we need a new "PyHC env" to facilitate this? The purpose of the current one is to hold all PyHC packages, whereas this PHEP specifies only Gold-tier packages get inclusion in the env. (Which also begs the question how will packages know if they're compatible with the env if they're not included in it?)

I think we want to establish some specific environments for this. @rebeccaringuette had the interesting suggestion in her comment (below yours) re creating two environments. I think some kind of split of Gold + Silver and then Gold + Silver + Bronze for PyHC-top-tier and PyHC-all environments, respectively (happy for some help in workshopping that terminology).

Question: how will this affect "core" package status? Will "core" packages still exist, or does Gold-tier become the new "core"?

I think this would make core go away, yes, leaving us the highest level being "Gold". It'd get confusing in my mind to delineate the differences between Gold and core. Further, we've always struggled to say what exactly it meant to be a core package, or how to become core package (apart from a nod of approval from current leadership and core package maintainers).

jibarnum commented 3 months ago

@rebeccaringuette thanks for your thoughts above!

Agree that benefit items like Python env inclusion and chat bot inclusion should only be available to...

Indeed, I'm trying to make that a bit more clear in the soon-to-come commit.

We need two versions of the PyHC environment to avoid creating an environment so large that no one wants to wait for it to install/load...

I like this thought. I'll include it. However, I do wonder how we intend to include the bronze categories, which allow some major conflicts to exist with installation into the software environment... thoughts?

agreed that standards compliance assistance should be available to all upon request

I mostly agree. I think if you're already at Gold, you probably will only get assistance if you're in danger of dropping down a level.

also hesitant about including interoperability status...

Yeah, I nixed that one. The metadata suggestion is good, though can you elaborate on how we would evaluate that?

also agree on including the PyHC env installation

Yep.

package DOI should be for the software repository,

Sure, that makes sense.

PyHC standard grades should not be determined by self-evaluation...

Indeed, and thus the point of doing a pyOpenSci review process. But the self-evaluation is just step one to getting there. Shawn does also do a general review to make sure the grades are commensurate with the state of a repository.

need to specify the current PyHC env...

Sure.

like the idea of the term 'core packages'...

Same, I'm nixing that once (if) this PHEP goes into place.

need to make some funding available for packages

For sure. First we need to get a good definition on what we want for PyHC-specific requirements for a pyOpenSci process to show we have the process in place and ready to go for packages.

Need to state a time frame for packages to submit the tier they best align with

For sure, I need to include some wording on this. I don't want to wait too long, so perhaps 6 months is best. I'll find out soon if that's a terrible idea by how many tomatoes are thrown my way with the next commit. :)

jibarnum commented 3 months ago

Alright, all. Tried to catch and incorporate as many comments as I could. Please review and let me know what concerns/suggestions I didn't capture or have come up with the changes. Thanks!

jibarnum commented 2 months ago

A note that this will supersede the existing project submission process is probably helpful. Also potentially a list of differences:

Self evaluation is now just starting point, TSC evaluation is required Additional requirements beyond the main PyHC standards Plus, of course, a commitment to update the submission process.

Sure that makes sense @jtniehof

jtniehof commented 2 months ago

Should there be an explicit closes #30 on this?

rstoneback commented 2 months ago

NASA funding is already requiring that software proposals satisfy PyHC standards. Did NASA check with us before adding that to funding announcements? Does applying PyHC standards in funding announcements comport with APA standards and U.S. agency rule making? What standards level is going to apply to NASA funding? Gold, silver, bronze, or copper?

Incidentally, my interest level in providing free labor to NASA, in the form of standards or otherwise, is quite low.

rebeccaringuette commented 2 months ago

That is a discussion to have with HDRL and NASA HQ once this gets settled. My initial thoughts are to require bronze as a minimum for software packages starting out. This sets the bar low, but still requires basic FAIR (e.g. DOI, license, pip for reusability, PyHC env for interoperability, and similar). Proposals from a bronze package (or copper) could alternatively ask for funds to improve the level to silver or gold in a detailed manner, e.g. the pyOpenSci review process.

rebeccaringuette commented 2 months ago

Also in the table, the HSSI row needs some work. Recommended copper = all mandatory fields, bronze = all mandatory and some recommended fields, silver = all mandatory and recommended fields, gold = all mandatory and recommended fields plus some optional fields. *See HSSI metadata schema for details. In addition to metadata fields, gold and silver level packages should have priority consideration in contributing to the controlled vocabularies used by HSSI. These packages would also be eligible, subject to review, to manage those controlled vocabularies in a rotating fashion under HSSI metadata leadership. Contributing to the controlled vocabularies should be a requirement for gold level packages (e.g. are we missing anything).

rstoneback commented 2 months ago

That is a discussion to have with HDRL and NASA HQ once this gets settled.

I disagree. If NASA wants to set the standards then they should set the standard. It should also be applied to not just to heliophysics, but to Earth, Planetary, and Astrophysics divisions. If NASA wants to use the PyHC standards then PyHC sets the standard, not NASA. I will repeat however that I think it is inappropriate for NASA to use the results of unfunded labor.

rebeccaringuette commented 2 months ago

That is a discussion to have with HDRL and NASA HQ once this gets settled.

I disagree. If NASA wants to set the standards then they should set the standard. It should also be applied to not just to heliophysics, but to Earth, Planetary, and Astrophysics divisions. If NASA wants to use the PyHC standards then PyHC sets the standard, not NASA. I will repeat however that I think it is inappropriate for NASA to use the results of unfunded labor.

The PyHC standards apply only to software relevant to Heliophysics and written in or run from Python, nothing more, and cannot be applied across NASA's divisions or even other software in Heliophysics. concerning the funding comment, the PyHC standards are mentioned as conditions on NASA funding opportunities, particularly the HTM call, so the requirement is not unfunded. I don't recall at the moment if it is mentioned on other calls. Since PyHC is now moving to tiered standards, the conversation between PyHC leadership, HDRL leadership and NASA HQ will likely be which tier to set as a minimum standard for an updated version of those funding calls, assuming that HQ decides to change the wording of that AO and others at all. The decision of which tier a given proposal chooses to adhere to (and how they intend to adhere to it) may instead be left to the decision of the proposal submitter, which would then be left to the scrutiny of the proposal reviewers.

jibarnum commented 2 months ago

Should there be an explicit closes #30 on this?

Yes, I'd say so!

rebeccaringuette commented 2 months ago

Also in the table, the HSSI row needs some work. Recommended copper = all mandatory fields, bronze = all mandatory and some recommended fields, silver = all mandatory and recommended fields, gold = all mandatory and recommended fields plus some optional fields. *See HSSI metadata schema for details. In addition to metadata fields, gold and silver level packages should have priority consideration in contributing to the controlled vocabularies used by HSSI. These packages would also be eligible, subject to review, to manage those controlled vocabularies in a rotating fashion under HSSI metadata leadership. Contributing to the controlled vocabularies should be a requirement for gold level packages (e.g. are we missing anything).

@jibarnum

jibarnum commented 2 months ago

Also in the table, the HSSI row needs some work. Recommended copper = all mandatory fields, bronze = all mandatory and some recommended fields, silver = all mandatory and recommended fields, gold = all mandatory and recommended fields plus some optional fields. *See HSSI metadata schema for details. In addition to metadata fields, gold and silver level packages should have priority consideration in contributing to the controlled vocabularies used by HSSI. These packages would also be eligible, subject to review, to manage those controlled vocabularies in a rotating fashion under HSSI metadata leadership. Contributing to the controlled vocabularies should be a requirement for gold level packages (e.g. are we missing anything).

Sure. I just went with what you'd said earlier for each level. I can update. I feel the HSSI metadata schema will require a url. Do we have one at the moment?

jibarnum commented 2 months ago

@rstoneback since HTM calls often closely align with the PyHC, and to the end of not siloing efforts, NASA made the choice to include our standards in their calls (to my knowledge, this is just for HTM). I was asked about wording for this, and provided what is shown therein. NASA could, in theory, go off and write their own things, but I suppose why reinvent the wheel if not necessary?

Like @rebeccaringuette it will require some discussion with NASA on if they want to update AO calls to match the new process we have, and if so, to what level. I'm not convinced it's appropriate to define here which level NASA funding calls will ascribe to. That's outside the scope of this PHEP, and wrong for us to levy that requirement on NASA since we're... not NASA.

I empathize with the funding concerns. The HTM call, albeit small at the moment, does have room for package maintenance funding requests. I strongly believe updating to better align with new PyHC tiering/PHEPs for standards would be a legitimate funding request. If enough packages are submitting those kinds of requests, that may even encourage NASA to start putting more money behind that (crosses fingers).

rebeccaringuette commented 2 months ago

Also in the table, the HSSI row needs some work. Recommended copper = all mandatory fields, bronze = all mandatory and some recommended fields, silver = all mandatory and recommended fields, gold = all mandatory and recommended fields plus some optional fields. *See HSSI metadata schema for details. In addition to metadata fields, gold and silver level packages should have priority consideration in contributing to the controlled vocabularies used by HSSI. These packages would also be eligible, subject to review, to manage those controlled vocabularies in a rotating fashion under HSSI metadata leadership. Contributing to the controlled vocabularies should be a requirement for gold level packages (e.g. are we missing anything).

Sure. I just went with what you'd said earlier for each level. I can update. I feel the HSSI metadata schema will require a url. Do we have one at the moment?

No, and likely not for a few months. We will need some tech support before that is available.

rebeccaringuette commented 2 months ago

What is this group's opinion on shifting the conda installation requirement to the silver level? It would simplify installation in the PyHC environment, especially on Heliocloud, but would such a requirement at the silver level too formidable of a hurdle so that it should only be at the gold level, or a simple enough task to include at the silver level? Note that pip installation is required at the bronze level.

nabobalis commented 2 months ago

What is this group's opinion on shifting the conda installation requirement to the silver level? It would simplify installation in the PyHC environment, especially on Heliocloud, but would such a requirement at the silver level too formidable of a hurdle so that it should only be at the gold level, or a simple enough task to include at the silver level? Note that pip installation is required at the bronze level.

For me, this should be at the bronze level.

sapols commented 2 months ago

I'll note that I intend to submit a proposal to hire a student developer whose sole job (at first) is to help PyHC packages join conda. No promises on how soon that could happen though, of course. I could buy conda installation being a silver-level thing if enough devs agree, but bronze is too low (as much as I'd love to do that, bronze just isn't realistic).

nabobalis commented 2 months ago

Unless you have compiled code, creating a conda forge recipe is no more difficult than setting up the python packaging required to get on pypi.

So for me, it should be at the same level as pip

jibarnum commented 2 months ago

Unless you have compiled code, creating a conda forge recipe is no more difficult than setting up the python packaging required to get on pypi.

So for me, it should be at the same level as pip

There are a few PyHC core packages not yet on conda (e.g. SpacePy IIRC @jtniehof ). It'd be good to hear from them on what the blockers are before deciding to relax the requirement down to silver or bronze.

rebeccaringuette commented 2 months ago

Thanks for the comments, Nabil. Absolutely, Julie. Would like to hear comments on this from others too. @jibarnum I have added this PHEP as a suggested unconference topic for the fall meeting, but it will need some structure to the discussion or it will be all over the place.

On Thu, Sep 19, 2024 at 4:16 PM Julie Barnum @.***> wrote:

Unless you have compiled code, creating a conda forge recipe is no more difficult than setting up the python packaging required to get on pypi.

So for me, it should be at the same level as pip

There are a few PyHC core packages not yet on conda (e.g. SpacePy IIRC @jtniehof https://github.com/jtniehof ). It'd be good to hear from them on what the blockers are before deciding to relax the requirement down to silver or bronze.

— Reply to this email directly, view it on GitHub https://github.com/heliophysicsPy/standards/pull/31#issuecomment-2362095736, or unsubscribe https://github.com/notifications/unsubscribe-auth/ALX7QXRBF6S4RDSTPM74GILZXMWIHAVCNFSM6AAAAABKTRZAKWVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDGNRSGA4TKNZTGY . You are receiving this because you were mentioned.Message ID: @.***>

nabobalis commented 2 months ago

Maybe if a package is pure python, it should be bronze, but more complex packages we bump that to silver?

But that might be too in the weeds for a rule or requirement.

rebeccaringuette commented 1 month ago

Since the standards landscape in PyHC is in flux, I suggest removing the PyHC standards grading row and instead asking all other PHEPs to determine what compliance looks like for each package level. That way, we don't have to renegotiate this PHEP for every change. The summation of those descriptions can be added to a summary document each time a new PHEP is approved. In my opinion, the "some", "most" and "all" terms currently on this row are too squishy to really be a standard. On the other hand, it could also be desirable for packages to choose which items of a list of standards to completely comply with based on their own package needs. Or, such considerations would ideally be incorporated into the descriptions of compliance for each PHEP and package level. Maybe some combination of the two ideas would be good, but consider this a push for more concreteness for this row.

All items except the pyOpenSci review process seem easy enough for quick checks to be implemented once passed. That seems to be a different PHEP needed.

One important missing component here is the level of contribution allowed and activity supported by a given package, and how that characteristic is imagined to be different for different package levels. This will likely require a custom review per package to confirm.

It also seems that the technical steering committee should be described in more detail in another PHEP, such as how to become a member of that (election vs service requirement?), what the requirements are to be on that committee (e.g. silver level?), any desired restrictions (one member per package at a time can run for election / be required to serve), rotations (2 years? half gets re-elected one year, the other half to be reelected the next year), and so on.

It would be nice to add that the self-assessment / PR activity described in the implementation section would be supported by a hackathon at a spring/fall PyHC meeting, although that may be too much in the weeds.

One thing we should recognize here that others have pointed out is the likely future multiplicity of PyHC software environments. As PyHC matures and our packages grow further in complexity, it may not be possible much longer for all packages to be installable in a single environment. I find it likely that there will be a PyHC software environment purposed for the summer school that drives continued improvements towards interoperability for that purpose, while there are other 'flavors' of PyHC environments directed towards a given analysis goal (e.g. mission pipeline development vs data analysis) or even categorized by sciences (e.g. solar vs ITM). This is yet to be determined, but for now the PyHC env row could be changed to refer to the PyHC software environment designed for the summer school since that is an effort that I expect to be more persistent than the other ideas. These ideas also seem to call for a change in the benefits table, which may be as simple as changing 'PyHC-all' to "a PyHC" software environment, and allowing the package to choose which one (other than the top-tier one).

Other missing factors here are test coverage and working documentation examples, but those seem better in a pyOpenSci review process. However, how would we require that the documentation examples keep working over time? If someone sees that a silver package's documentation does work, then their opinion of all silver level packages will be decreased, so there is some level of reputation and upkeep to factor in somewhere. Is that part of the pyOpenSci process? If so, maybe that component can be used as a way to judge maintenance?

jtniehof commented 1 month ago

Maybe if a package is pure python, it should be bronze, but more complex packages we bump that to silver?

But that might be too in the weeds for a rule or requirement.

I do think it's reasonable that being on conda-forge is a requirement at a higher level than being on PyPI; the appropriate breakpoint is up for discussion. PyPI is pretty much essential and conda-forge may not be more difficult but it's an additional thing.

To Rebecca's suggestion of deferring a lot of the specifics to additional PHEPs, #35 has some discussion on packaging standards. At this point we have no standards PHEPs, so in theory we could rewrite this to be "how future PHEPs declare the way things fit into this system" and not have to backfill on anything.

jtniehof commented 1 week ago

@jibarnum @rebeccaringuette can we link (and maybe summarize) the relevant fall meeting notes and miro board here so the discussion is accessible to all?

jibarnum commented 4 days ago

@jibarnum @rebeccaringuette can we link (and maybe summarize) the relevant fall meeting notes and miro board here so the discussion is accessible to all?

Hey @jtniehof, yeah I have it on my to-do to update this PHEP based on the fall meeting. I can summarize/link to Miro boards/meeting report for referene.