opensearch-project / .github

Provides templates and resources for other OpenSearch project repositories.
Apache License 2.0
28 stars 71 forks source link

[META/Proposal] OpenSearch project should Adopt Apache Foundation's stance on the use of genAI #182

Open scrawfor99 opened 7 months ago

scrawfor99 commented 7 months ago

Tl;DR: GenAI use is already common across many opensource projects and websites. We (OpenSearch) should acknowledge the value of these tools but also enforce responsible use. I propose the adoption of the Apache Foundation's Generative AI Guidance (https://www.apache.org/legal/generative-tooling.html) or something similar.

Describe the solution you'd like We should add a new line to the sign-off that says you attest you reported any use of generative AI tooling. To report the use we can just add a new checkbox alongside the several we already have on pull requests.

I think this solution gives the project leeway in addressing generative AI use on a case by case basis while also getting in front of some of the more problematic practices genAI has led to. There has been discussion around examples of PRs from various projects where documentation and/or code has been run through gen AI tooling and then PRs opened to make these changes.

These contributions have been referred to as "low effort" or "low value" in context of those changes and demonstrated the need for some guidelines.

OpenSearch is millions of lines of code and changes to improve our readability of efficiency are often useful. However, there should be expectations about how and when genAI use is appropriate. I think having this evident on PRs will make it easier for maintainers to be judicious in reviewing pull requests.

Describe alternatives you've considered Leaving things as is; banning genAI use; no restrictions whatsoever.

Additional context I can share example PRs of what I am referring to, but please reach out to me directly on the public slack as I would prefer not to publicly use anyone's work as a "bad" example.

dblock commented 7 months ago

@scrawfor99

  1. Maybe copy paste some anonymized/generic screenshots?
  2. I am moving this issue to .github. That's where any project-wide policy would go.
dbwiddis commented 1 day ago

@scrawfor99 Thank you for opening this issue. This is definitely a subject worth discussing.

As for the specific recommendation... like everything with GenAI these days, it's complicated.

With respect to the ASF-specific language, I want to exercise a bit of caution. I respect the ASF immensely, and trust their analysis of issues like this. However, their policies tend to err very strictly, and unless we intend to become an ASF project, incur many restrictions borne out of caution more than necessity.

As most SDEs these days, I recognize the benefit that GenAI has brought to us. I've learned new things and improved my own code based on those learnings. But it's really hard to put in writing exactly what that means. I've asked an GenAI site to write a test case for some code and reviewed its response, excerpted relevant parts of it, and implemented it myself. It's certainly not perfect, but it does better in many cases than I would do, and I think a very detailed human review of the AI result is a (very) happy medium. But it's a compromise that defies codification in any written rules.

On the other side of the argument, I have contributed hundreds of answers on Stack Overflow under CC-BY-SA license which (among other requirements) requires attribution. Such contributions are forbidden from ASF code for good reason. And yet, the same GenAI that has helped me write better test cases also will answer questions in the narrow domain that I've contributed to with word-for-word responses that I wrote, without attributing me. GenAI is a legal landmine.

All this is to say I agree that we need a policy, but I am not sure I want to adopt ASF-specific language, but I do want to make it clear that authors need to write their own code themselves, even if they learned something by getting an answer from AI.

That's hard to do. I understand why the most strict restrictions are in force when legal requirements are at stake. I just want to hold back from the "MUST" and keep the "SHOULD" here, as this is an impossible-to-really-enforce requirement.