planetarysoftware / ISIS_TC

ISIS Technical Committee
https://github.com/USGS-Astrogeology/ISIS3
6 stars 11 forks source link

ISIS As Scientific Software Release #167

Open jlaura opened 2 years ago

jlaura commented 2 years ago

Apologies that I was unable to attend the most recent ISIS TC. I was reviewing the notes and saw the following:

    - Scientific software releases impose extra overhead for contributors
       - Also limits committers
       - It is very hard to transition to a true open-source eco-system with these restrictions
         - There will need to be large contributions outside of the ASC/USGS to get away from releasing through these channels
     - Concerns about ISIS being available and supported in light of requirements
       - ASC is working very hard to make sure that ISIS is available and supported no matter the requirements from above us

Perhaps I am missing some critical context. Question points where I am seeking clarity:

  1. Would it be possible for someone to explain to me specifically how the method that the USGS uses to release software is adding a undue burden on the contributor base?

  2. Would someone also explain to me what a 'true open-source eco-system' is and how that differs from the current state of the software where anyone can open a PR and, assuming they meet the project's contribution guidelines, have that merged?

I much appreciate the specificity that I hope folks respond with. That specificity means that the 'soft power' that the ISIS TC has (as per the governance documents submit a PR or become a funding agency if you want some other type of power) can be used to help influence the direction of the project.

jessemapel commented 2 years ago

@KrisBecker raised concerns about point 1 and can probably clarify more.

For point 2, it cannot be a truly open source package because Astrogeology will be gatekeepers to all contributions until the package is no longer released solely by Astrogeology. There cannot be maintainers outside of Astrogeology who can freely approve and merge new contributions under the release and github enterprise requirements.

jlaura commented 2 years ago

Jesse, thanks for the clarity. I of course, have more questions and hopefully a clarifying remark....

For 2, once external collaborators are approved inside of the DGEC (which is in the works, hopefully this FY), anyone that goes through that process will be able to merge external PRs. Currently, we have a number of external contributors that have merge rights on the repo that we implicitly expect that meet the code merging requirements. This is all mechanics though.

What makes something a 'true open-source eco-system' project? Is it the fact that it is released on GitHub? Is it the addition of contribution guidelines and the banner in the readme that says accepting contributions? Is it having more than one maintainer? Is it a requirement that subsequent maintainers need different professional affiliations? Is it a guarantee that any PR that lands will be merged?

michaelaye commented 2 years ago

Wondering if PyTorch recent efforts with the separation of technical maintainership from the foundation is something that could help guide ISIS’ efforts? Or maybe we already got something very similar? https://pytorch.org/blog/PyTorchfoundation/

KrisBecker commented 2 years ago

I apologize if there were other discussion I was not present for that describes benefits/consequences/impact of the USGS ISIS software release process on the ISIS community. I pursued the references given, of which there is a hugely significant quantity of documentaton related to USGS Software Release policies. Please clarify any misunderstandings I may have.

In the May 12, 2022 ISIS/TC meeting, the Compliance section states that all USGS repos will be migrated to USGS Enterprise Github and they must all go through an official USGS software release process [1].

The May 12 meeting notes provide a link to Types of Software Review [2] which provides guidelines for review of USGS software. There are two main software types: Scientific and Provisional/Preliminary Software. The USGS has declared ISIS as Scientific Software and therefore everyone, including external contributors/collaborators, must comply with DOI and USGS software release policy and standards [3]. This policy appears to require that the ISIS repo be moved into internal DOI/USGS Enterprise Github servers.

There are inherent risks and limitations to ISIS being released strictly as USGS Scientific Software and the repo being hosted on internal DOI/USGS servers.

It is concerning that official releases of ISIS can only be made through a process that is entirely internal to the USGS. No non-USGS/ASC employee can release ISIS, even if collaborators have acquired merge rights due to the approval process (via IPDS) required by USGS Scientific Software release policy. This is concerning because if the USGS/ASC is unavailable for any reason, ISIS cannot be released. This is particularly concerning should the US government be shut down. During a shutdown, all government employees are restricted from even logging in, and most all government servers are shut down. Events like this will completely shut down access to the ISIS repo.

It has been mentioned that external collaborators can acquire merge rights through a yet unavailable (DGEC) process. I am not 100% certain of this, but I doubt this privilege will be granted to contributors/collaborators from foreign countries because of access restrictions to internal government computer systems, which could discourage participation/collaboration from organizations such as ESA and JAXA.

It is concerning that according to Code.gov, the DOI, which is the parent department of the USGS, is one of only two agencies (the other being the Department of Defense) that is non-compliant with the 2016 Federal Source Code Policy. I am not sure this information is current, or what the status of the USGS policy [3.2] is in regards to compliancy, but perhaps ISIS governance should adopt guidelines that are compliant with the Federal policy, such as NASA, which is fully-compliant.

The USGS can also release ISIS as Provisional or Preliminary Software. For convenience, here is the excerpt from [3.6] that describes this type of software desgination by the USGS:

  1. Provisional Scientific Software Release. Scientific software is increasingly a part of sharing ideas and analytical methods during the course of a USGS project and as a regular part of the scientific workflow. Developing provisional software openly via online code repositories is a common practice for collaboration and sharing with colleagues and peers from other institutions, as these efforts can be a way to improve scientific outcomes. Release of in-progress scientific software for collaborative or informal sharing through a publicly accessible code repository requires Science Center level approval of the methods and practices used and assurance that personal, private, or otherwise sensitive information is not shared, but does not require tracking through the IPDS or approval as an information product. Collaborative and informal sharing of software not involving posting online to a publicly accessible code repository can be done in accordance with the requirements for safeguarding unpublished USGS information prior to release (refer to SM 502.5). Provisional scientific software that is released must include the appropriate disclaimer statement (refer to section 5.D.), as well as sufficient code documentation to promote usability as described in section 5.E.

This pretty much describes the ISIS open source environment that is/was hosted on the public Github system. And it is important to note that approval for provisional software policy decisions are delegated to the Science Center level and does not require IPDS tracking or DOI approval of how the software is released. The USGS recognizes this process and provides guidelines that support the existing ISIS collaborative environment without pulling the code base into internal USGS servers.

Hence, I think ISIS is both Scientific and Provisional Software according to USGS policy [4]. And I have a suggestion that may help provide a clear, more collaborative and cost effective ISIS development and management workflow.

The current online ISIS Github repo should remain to support a more open, inclusive and collaborative environment for all contributors. This serves as the USGS ISIS Provisional Software system where policy is set by ISIS community governance, of which the USGS currently has a significant role. ISIS release schedule and release managers can be approved by the ISIS governance/community and must be sufficiently trained to do so. The USGS can then peel off and install on their internal Github server an ISIS Long Term Support (LTS) release from the ISIS Provisional system and adjust/change the code according to its Scientific Software approval and release requirements. Once released, the USGS and ISIS community can provide support of the ISIS LTS for mission critical activities and other ISIS users. This degree of separation will allow for both environments to operate largely unencumbered by the other.

This approach clearly differentiates a quality USGS ISIS Scientific Software release, derived from the USGS ISIS Provisional, continually changing/evolving, system, that satisfies USGS policy and maintains the traditionally high degree of USGS integrity. It also minimizes the number of Scientific Software (LTS) releases that the USGS would need to make (once every 1.5 years?) thus reducing the impact on resources/costs and improving the support effort. This also limits the scope and impact of DOI/USGS policy on the ISIS Provisional system/users and isolates it within the Scientific release process, a wholly USGS function.

This management approach is a compromise toward an open source community that is all-inclusive, self-governing, self-supporting and can continue to thrive with contributions from many individuals and organizations.

References

  1. Additional Guidance on Review and Approval of Scientific Software, USGS (https://www.usgs.gov/software-management/additional-guidance-review-and-approval-scientific-software)
  2. Types of Software Review, USGS (https://www.usgs.gov/products/software/software-management/types-software-review)
  3. IM OSQI 2019-01, Review and Approval of Scientific Software for Release, USGS (https://www.usgs.gov/survey-manual/im-osqi-2019-01-review-and-approval-scientific-software-release)
  4. E.6 Software - Extended Guidance and Specific Products, USGS (https://www.usgs.gov/office-of-science-quality-and-integrity/e6-software)
jlaura commented 2 years ago

@KrisBecker Thanks for the lengthy reply. You are making a number of incorrect assumptions.

The May 12 meeting notes provide a link to Types of Software Review [2] which provides guidelines for review of USGS software. There are two main software types: Scientific and Provisional/Preliminary Software. The USGS has declared ISIS as Scientific Software and therefore everyone, including external contributors/collaborators, must comply with DOI and USGS software release policy and standards [3]. This policy appears to require that the ISIS repo be moved into internal DOI/USGS Enterprise Github servers.

Almost correct. The code is being moved to a publicly available Github Enterprise instance, hosted through GitHub.com that meets federal code security requirements.

There are inherent risks and limitations to ISIS being released strictly as USGS Scientific Software and the repo being hosted on internal DOI/USGS servers.

Given the above, this is an incorrect assumption.

It is concerning that official releases of ISIS can only be made through a process that is entirely internal to the USGS. No non-USGS/ASC employee can release ISIS, even if collaborators have acquired merge rights due to the approval process (via IPDS) required by USGS Scientific Software release policy. This is concerning because if the USGS/ASC is unavailable for any reason, ISIS cannot be released. This is particularly concerning should the US government be shut down. During a shutdown, all government employees are restricted from even logging in, and most all government servers are shut down. Events like this will completely shut down access to the ISIS repo.

This is frankly a ridiculous concern. We do not work in some life or death, critical to the moment situation. It is a struggle to get users to upgrade. The longest government shutdown in history was 3+ years ago and was significantly shorter in duration than our standard 3 month release cycle. That is a cycle that you have personally complain is too fast. If we can leave this straw man argument behind, that would be swell.

It has been mentioned that external collaborators can acquire merge rights through a yet unavailable (DGEC) process. I am not 100% certain of this, but I doubt this privilege will be granted to contributors/collaborators from foreign countries because of access restrictions to internal government computer systems, which could discourage participation/collaboration from organizations such as ESA and JAXA.

Anyone can open a PR on the repo. It is a public repository. If people are going to have merge rights, then we have a process to go through. Honestly, no one outside the ASC has merged anything anyway, so again, this is a moot point from my perspective.

Hence, I think ISIS is both Scientific and Provisional Software according to USGS policy [4]. And I have a suggestion that may help provide a clear, more collaborative and cost effective ISIS development and management workflow.

This is something for ASC management to worry about. Not the ISIS TC. If you want to change the way that the federal government writes software, start writing letters to your federal legislators. If you are not getting traction there, fork the project and do whatever you want with it.

The current online ISIS Github repo should remain to support a more open, inclusive and collaborative environment for all contributors. This serves as the USGS ISIS Provisional Software system where policy is set by ISIS community governance, of which the USGS currently has a significant role. ISIS release schedule and release managers can be approved by the ISIS governance/community and must be sufficiently trained to do so. The USGS can then peel off and install on their internal Github server an ISIS https://github.com/USGS-Astrogeology/ISIS3/discussions/4691 release from the ISIS Provisional system and adjust/change the code according to its Scientific Software approval and release requirements. Once released, the USGS and ISIS community can provide support of the ISIS LTS for mission critical activities and other ISIS users. This degree of separation will allow for both environments to operate largely unencumbered by the other.

I agree with this being an open and collaborative community. I've worked for 4+ years to transition from the closed model the ASC used for 15+ years for development into the current state. Even if someone at the ASC wanted to transition to a closed model, federal law prohibits that (thankfully). The rest of this seems to be trying to find a way to not have things be internal only. Again, that is an erroneous assumption.

Ohh, and thanks for taking the time to read all the software release policies! They are super interesting. ISIS is going to remain scientific software for all versions with official releases. Between releases, the dev branch is provisional software because, it has not been officially released. All those requirements that we have for releasing, we need to meet them in dev so that we don't end up with a pile of un-releasable code because it lacks tests, documentation, statements of validity, etc.

None of the text above address my question though.

What makes something a 'true open-source eco-system' project? Is it the fact that it is released on GitHub? Is it the addition of contribution guidelines and the banner in the readme that says accepting contributions? Is it having more than one maintainer? Is it a requirement that subsequent maintainers need different professional affiliations? Is it a guarantee that any PR that lands will be merged?

jessemapel commented 2 years ago

Jesse, thanks for the clarity. I of course, have more questions and hopefully a clarifying remark....

For 2, once external collaborators are approved inside of the DGEC (which is in the works, hopefully this FY), anyone that goes through that process will be able to merge external PRs. Currently, we have a number of external contributors that have merge rights on the repo that we implicitly expect that meet the code merging requirements. This is all mechanics though.

What makes something a 'true open-source eco-system' project? Is it the fact that it is released on GitHub? Is it the addition of contribution guidelines and the banner in the readme that says accepting contributions? Is it having more than one maintainer? Is it a requirement that subsequent maintainers need different professional affiliations? Is it a guarantee that any PR that lands will be merged?

True, I think I have my mind set on a more community focused contribution model, but there are open source models that have primary maintainers in them. Practically, this policy doesn't change the contribtuin composition of ISIS. If we really want to allow non-ASC devs to merge, then we can also look into ways of releasing ISIS outside of the ASC when we have a broader contributing base.

rbeyer commented 2 years ago

This thread has gone in a number of directions, and has significantly diverged from the questions in the original post. I also want to take this opportunity to thank everyone for engaging with the ISIS TC process, and it is important to make sure we are engaging in civil discourse that allows us to move forward and make progress together.

I think that this thread has covered, to large extent, the original post's questions:

  1. Would it be possible for someone to explain to me specifically how the method that the USGS uses to release software is adding a undue burden on the contributor base?

The meeting notes do not include the term "undue," I think my interpretation (because I wasn't there either) is the notes simply make a factual statement that there is more effort on a contributor's part to successfully submit to a "USGS Scientific Software" repo. Nobody likes more work, and that may have a practical effect in limiting contributions because the "extra effort" is too much (but it may not, and only time will tell). However, we live in a world of constraints, and the fact that ISIS is classified as "USGS Scientific Software" has implications for how submissions are made going forward.

  1. Would someone also explain to me what a 'true open-source eco-system' is and how that differs from the current state of the software where anyone can open a PR and, assuming they meet the project's contribution guidelines, have that merged?

@jessemapel made some excellent replies, but I think that the term "true open-source eco-system" was a shorthand note written during discussion. I am not aware of this phrase meaning something specific. I appreciate the arguments that @KrisBecker has brought up, and that philosophically it would be great if every part of the process was open to participation by willing parties, regardless of affiliation. However, that just isn't possible for ISIS as it is now. I'll note that ASP is similar, there are only a few of us that have commit rights and can actually build and release ASP (and we all have e-mails that end in @nasa.gov).

There is no one, true way that an open source software project should be. There are as many governing processes as there are projects. What makes a project open source is its availability, not the process by which it is made available. There are a lot of ways that human beings can operate to produce code that is made freely available, and people (and governments, and employers) have different perspectives (and requirements) on that process.

I very much appreciate the deep dive into documentation-reading that @KrisBecker performed, and his suggestion for a compromise that might allow more open participation in the process of software release. However, as @KrisBecker's post states, the USGS has already made a determination that ISIS is not "Provisional" but is instead full "Scientific Software." In some side-conversations with several USGS personnel it is very unlikely that the USGS management would change that designation.

I think there is a great deal of nuance and complexity in all of the topics brought up in this thread, and I feel that we might make better progress by adding discussion of this Issue to the October meeting (I don't see a PR for a draft agenda yet) than more posts here.

jlaura commented 2 years ago

First off, thanks for the discussion and apologies that my response came off uncivil and snarky. Not the intention. Also, apologies for the delay response. I've been locked out of my workstation for the better part of three days...

I am definitely frustrated with the perception that the contribution requirements to this project is out of line with contribution requirements for other open source projects. From my perspective, the contribution requirements are inline with community norms for a project that supports a community of this size. The maintainers on the software are required to support not just a single group, but the vast majority of current and past missions plus an active science user community plus non-mission data processors (e.g., the PDS and their processing tools).

It seems to have caused a real issue for potential contributors, and a lot of complaining to all levels, that code needs to be tested, documented, and vetted. My perception is that people feel these are new requirements are out of line with OS community norms. From my experience as a contributor across a dozen or more active OS projects, they are not. We started a 2018 SAT review, building off a previous review, discussing the need for stable, documented, high quality code. Tests, documentation, and vetted implements are what result in code of that quality. Whether that code is labelled provisional, scientific, or literally anything else is, from my perspective, irrelevant (more below). The important part is that the project is releasing the highest quality code to the most inclusive user base.

Yes, supporting that scope of user base requires that every contributor meet a standard. As I stated above, that standard is no different from what I see across the OS community for projects that have a scope larger than a single research lab or small group project.

I should note that the ISIS TC is fully empowered to decide that the software project should not meet standards that maintain high quality and can make a recommendation to the maintainers of ISIS that code should be merged under a different model. The ISIS project can then determine if and how it could meet those ISIS TC proposed standards.

Much more realistically, I would love to see active participation by everyone on the TC working to improve the process documentation about how to meet what I perceive to be standard requirements. For example, by replying to this issue opened by @jessemapel. My perception is that we are getting a lot of complaining because complaining is easy and a lack of other engagement.

Corrections to the above:

I want to also correct some of the statements above. The dev branch of ISIS and any feature branches are classified as USGS provisional software releases. Therefore, some versions of ISIS are provisional. When the ASC creates a release, we take the provisional dev branch, perform the necessary security audits and scientific reviews, and create a release that is then classified as scientific software. The ASC has made the choice that it is significantly more efficient to have code contribution include the necessary information for domain/scientific reviews at the time of merge into dev. The alternative is deemed to be too costly, where we merge without the required elements and then follow up with contributors at release time, reverting changes if we can not get the necessary information. (The ISIS TC could definitely recommend that the latter approach be taken, but I do not anticipate a change as the costs associated with that approach are non-trivial and fall entirely to the ASC.)

My opinionated TL;DR on this is basically that the classifications for software release as not pertinent to anything that the ISIS TC can meaningfully influence at this time. Discussions related to the requirements for submission to this project need to include meaningful engagement in concretely describing actionable issues and participation in solicitations for more information. From my perspective, the bar for contribution is low: tests, documentation, a statement about how algorithmic contributions are vetted for correctness because reviewers do not always have the expertise and/or data sets to independently verify all proposed changes.

rfergason commented 2 years ago

I also thank everyone for this conversation and agree that we should continue to discuss these topics in a civil and professional manner. That can sometime be challenging when we are all very passionate about the work we do (which is a good thing!), but I think it's necessary to foster productive discussions around these topics.

I want to briefly comment on the provision software release designation portion of this thread. Provisional software is intended to be preliminary software still under development and as a means to share with collaborators in a limited capacity (see Section 4G of https://www.usgs.gov/survey-manual/im-osqi-2019-01-review-and-approval-scientific-software-release). Since we do not release any aspect of the ISIS code base in a limited capacity, we are already pushing the boundary of the provisional software definition in an effort to reduce the release burden on our community. I also don't think we can make the honest case that a 30+-year old code base that is used as the foundation for projecting and processing authoritative planetary mission data released to the public is preliminary or provisional. The intent of this preliminary designation is not to avoid following a reasonable implementation of Federal policy.

As noted above, the ASC does provisionally release the ISIS software between major version releases. Once we increase a major version number, however, we do a full scientific release of the software following the USGS software release policy; a major version release is defined by the semantic versioning policy agreed upon by the TC (https://github.com/planetarysoftware/ISIS_TC/blob/master/Versioning.md). Again, I think this is a reasonable implementation of this policy and is aligned with the intention of the preliminary software designation.

jessemapel commented 2 years ago

I also want to state that I do not think the contribution requirements are too burdensome. They are more burdensome than I would ideally like, because of the level of documentation required by government guidelines, but I don't think it's a significant amount.

I think some level of complaining and pointing out when we're unhappy is important. I am getting concerned that this discussion is not translating into anything concrete. The number of external contributions is still very low right now. We know that there are more coming in the new fiscal and with new instruments and data campaigns. I think it's best if we frame further discussion around specific contributions and specific concerns that we can address. As contributors are making contributions and do the extra documentation steps required for the scientific software release, please point out anything that you feel was egregiously burdensome. That makes it easier for us to translate this discussion into action and ultimately a better experience for everyone.

jlaura commented 1 year ago

Hey all, just wanted to update that we are now able to support external sponsored collaborators for our repos. What does all that jargon mean? If we have non-USGS folks that are wanting to have write permissions on USGS hosted repositories we can do that. We (USGS) are still a few weeks / months out from getting our repositories migrated. Having said that, this group could/should definitely discuss what the requirements are for write access on the repo and then I am happy to help anyone that wants to go through the process.