[PROPOSAL & DISCUSSION] Organization-wide Testing Requirements

stephen-crawford commented 1 year ago

TL;DR: What do you want the organization to use as testing requirements? What certifies something is "done?"

What are you proposing?

A major topic discussed at the milestone meetings hosted by @CEHENKLE, has been establishing a universal quality bar. An actionable mechanism for the this is using GitHub actions and checks to assert changes meet expectations.

This issue should be used as a discussion board for what these expectations are. Everyone across the @opensearch-project, is encouraged to contribute as this is a change that will affect us all. The final result of this discussion will be a "universal" set of expectations which all repositories will employ. This means that maintainers will have an established quality bar for merging changes, and that contributors can know all requirements before they start working.

What users have asked for this feature?

The milestone group worked with numerous repositories to learn their pain points--the things that were preventing them from releasing on time and making a great product. One of the consistent patterns was a lack of established expectations. One group wanted changes to have "a, b, and c," while another wanted "x, y, and z" This proposal seeks to establish a set of accepted expectations across all repositories.

What is the contributor experience going to be?

Under the new system, a contributor will be able to reference the CONTRIBUTING.md document of any @opensearch-project repository and see the repository's expectations. Most of these expectations will be standardized across the organization but there may be additional requirements depending on each repositories use case. For example, the Security repository may require that the plugin install workflow passes but this would not be required for the whole organization.

This discussion establishes a base quality bar across the organization. If any repository wants to add further requirements specific to them that is encouraged. They will need to list the additional requirements alongside the standardized ones in their CONTRIBUTING.md.

How does this impact flaky tests?

Currently, we are trying to reduce the number of flaky tests which exist in the project. Unfortunately, this is easier said than done and the process is ongoing. Part of this discussion should be to establish expectations for handling existing flaky tests and it is likely that the new quality bar will expect no more flaky tests are introduced. A method of doing this is running all new tests in a separate workflow which is isolated from the older tests. This prevents any currently flaky tests from impacting newly added changes.

What will it take to implement?

In order for repositories to adopt the new quality bar, they should update their CONTRIBUTING.md document and their testing frameworks. Since the expectations are being standardized, it should be relatively easy to port between repositories. If tests are written as GitHub Actions they can likely be directly copied with minimal changes.

Examples of some of the types of tests that may be helpful can be found in the identity project which has workflows for testing a wide range of functions on commit. There may be many more tests that should be mentioned however, and expectations can also include things like updated developer guides.

Any remaining open questions?

We are still trying to determine what these expectations should be so it is important that all opinions are heard. If you have any comments, concerns, or suggestions please share them below.

stephen-crawford commented 1 year ago

I am going to go ahead and get this ball rolling.

Some things I would be interested in seeing are:

New tests auto-run: Keep the existing gradle check, but add a workflow that automatically scanned for new test files (gradle + GitHub should be able to do this) and then runs those tests separately. This would let you know whether there was an issue with the test you just wrote or something related to how the tests interact with the older test suite. @cwperks wrote a file that does something along these lines over in the Identity project, and it has worked well for us. I think taking Craig's idea and going one step further could be helpful for a lot of repos.
Dashboards auto-push to functional repo: I have heard from some dashboards-minded folks that the functional tests repo does not always get individual repos' tests. I am not much of a dashboards expert myself so do not know what is required here, but it sounds like adding a check to make sure that all tests are added to the functional testing repo would be helpful.
Documentation check: Something else that I heard was that there are some challenges keeping documentation relevant and up-to-date. I think it could be helpful to add an auto-check like there is for the CHANGELOG that checked for documentation updates or required you to override it stating that your change only modified code irrelevant to feature execution. I.e. you updated a dependency or added an extra test.

stephen-crawford commented 1 year ago

To help establish guidelines, I am going to start going through each repository and reviewing their testing framework and confirming their current requirements. I will then keep a running document with each of the repos' requirements listed. From there, I will synthesize a set of requirements that should be established.

stephen-crawford commented 1 year ago

Repository	Unit Tests	Integration Tests	Backwards Compatibility Tests	Additional Tests	Link
Alerting	- [x]	- [x]	- [x]	Security Test Workflow, Multi-node Test Workflow, Certificate of Origin	https://github.com/opensearch-project/alerting/issues/803
Alerting Dashboards	- [x]	- [x]	- [ ]	Create Documentation Issue, Certificate of Origin	https://github.com/opensearch-project/alerting-dashboards-plugin/issues/491
Anomaly Detection	- [x]	- [x]	- [x]	Link checker, AD benchmark, Certificate of Origin	https://github.com/opensearch-project/anomaly-detection/issues/814
Anomaly Detection Dashboards	- [x]	- [x]	- [ ]	Link checker, AD benchmark, PR Labeler, Certificate of Origin	https://github.com/opensearch-project/anomaly-detection-dashboards-plugin/issues/428
Ansible Playbook	- [ ]	- [ ]	- [ ]		https://github.com/opensearch-project/ansible-playbook/issues/117
Asynchronous Search	- [x]	- [x]	- [x]	Certificate or Origin, Multi-node test	https://github.com/opensearch-project/asynchronous-search/issues/243
Common Utils	- [x]	- [x]	- [ ]	Certificate or Origin	https://github.com/opensearch-project/common-utils/issues/368
Cross Cluster Replication	- [x]	- [x]	- [x]	Certificate or Origin, Security Tests	https://github.com/opensearch-project/cross-cluster-replication/issues/721
Dashboards Anywhere	- [x]	- [x]	- [ ]	Health Check, Deployment Tests, Functional Tests	https://github.com/opensearch-project/dashboards-anywhere/issues/153
~Dashboards Desktop~	- [ ]	- [ ]	- [ ]		https://github.com/opensearch-project/dashboards-desktop/issues/37
Dashboards Maps	- [x]	- [x]	- [ ]		https://github.com/opensearch-project/dashboards-maps/issues/262
Dashboards Notifications	- [ ]	- [x]	- [ ]	Lint Checker, Documentation Auto-cut	https://github.com/opensearch-project/dashboards-notifications/issues/17
Dashboards Observability	- [x]	- [x]	- [ ]	Link Checker, Certificate of Origin	https://github.com/opensearch-project/dashboards-observability/issues/289
Dashboards Query Workbench	- [x]	- [x]	- [ ]	Code QL, Lint Checker, Certificate of Origin, SQL Query Workbench	https://github.com/opensearch-project/dashboards-query-workbench/issues/45
Dashboards Reporting	- [x]	- [x]	- [ ]	Certificate or Origin, Reports Checker	https://github.com/opensearch-project/dashboards-reporting/issues/65
Dashboards Search Relevance	- [x]	- [x]	- [ ]	Certificate or Origin, Changelog Verifier, Lint Checker	https://github.com/opensearch-project/dashboards-search-relevance/issues/152
Dashboards Visualizations	- [x]	- [x]	- [ ]	Certificate or Origin, Link Checker	https://github.com/opensearch-project/dashboards-visualizations/issues/167
Data Prepper	- [x]	- [x]	- [ ]	Certificate of Origin, Create Document Issue, Performance Tests Compile, App Check, Trace Analytics Tests	https://github.com/opensearch-project/data-prepper/issues/2302
Geospatial	- [x]	- [x]	- [ ]	Certificate of Origin, Link Checker	https://github.com/opensearch-project/geospatial/issues/227
Helm Charts	- [x]	- [x]	- [ ]	Lint Checker	https://github.com/opensearch-project/helm-charts/issues/3858
Index Management	- [x]	- [x]	- [x]	Certificate of Origin, Link Checker, Lint Checker, Security Test, Docker Security Test, Multi-node Test	https://github.com/opensearch-project/index-management/issues/698
Index Management Dashboards Plugin	- [x]	- [x]	- [ ]	Link Checker, Certificate of Origin	https://github.com/opensearch-project/index-management-dashboards-plugin/issues/629
Job Scheduler	- [x]	- [x]	- [x]	Certificate of Origin, Link Checker	https://github.com/opensearch-project/job-scheduler/issues/327
k-NN	- [x]	- [x]	- [x]	Certificate of Origin, Link Checker, Documentation Request, Benchmarking Tool, Custom Benhcmarking Tool	https://github.com/opensearch-project/k-NN/issues/775
ML-Commons	- [x]	- [x]	- [x]	Certificate of Origin, Documentation Request	https://github.com/opensearch-project/ml-commons/issues/752
ML-Commons Dashboards	- [x]	- [ ]	- [ ]	Lint Checker	https://github.com/opensearch-project/ml-commons-dashboards/issues/127
Neural Search	- [x]	- [x]	- [ ]	Certificate of Origin, Link Checker, Benchmarking Tool (in progress)	https://github.com/opensearch-project/neural-search/issues/124
Notifications	- [ ]	- [x]	- [ ]	Link checker	https://github.com/opensearch-project/notifications/issues/622
Observability	- [x]	- [x]	- [x]	Link checker, Certificate of Origin, Enforce PR Labels	https://github.com/opensearch-project/observability/issues/1416
OpenSearch	- [x]	- [x]	- [ ]	Code Hygiene, Changelog Verifier, Certificate of Origin, Label Checker, Link Checker, Create Documentation Issue, ShellCheck, Validate Gradle Wrapper	https://github.com/opensearch-project/OpenSearch/issues/6389
OpenSearch Benchmark	- [x]	- [x]	- [ ]		https://github.com/opensearch-project/opensearch-benchmark/issues/222
OpenSearch Build	- [x]	- [x]	- [ ]	Certificate of Origin, Link Checker, License Header, Groovy Tests, Yaml Lint, Python Tests	https://github.com/opensearch-project/opensearch-build/issues/3241
OpenSearch CI	- [x]	- [x]	- [ ]		https://github.com/opensearch-project/opensearch-ci/issues/254
OpenSearch CLI	- [x]	- [x]	- [ ]		https://github.com/opensearch-project/opensearch-cli/issues/75
OpenSearch Dashboards	- [x]	- [x]	- [ ]	Code Hygiene, Changelog Verifier, Certificate of Origin, Label Checker, Link Checker, Document Link Checker	https://github.com/opensearch-project/OpenSearch-Dashboards/issues/3466
OpenSearch Dashboards Functional Test	- [x]	- [x]	- [ ]	Link Checker, Lint, Bundled Tests, tests added from other repositories	https://github.com/opensearch-project/opensearch-dashboards-functional-test/issues/546
~OpenSearch DSL-PY~	- [x]	- [x]	- [ ]	Changelog Verifier, Link Checker	https://github.com/opensearch-project/opensearch-dsl-py/issues/101
OpenSearch Go	- [x]	- [x]	- [ ]	License Headers, Changelog Verifier, Link Checker, Lint Checker	https://github.com/opensearch-project/opensearch-go/issues/238
OpenSearch Hadoop	- [ ]	- [x]	- [ ]	Certificate of Origin, Link Checker, Changelog Verifier	https://github.com/opensearch-project/opensearch-hadoop/issues/123
OpenSearch Java	- [x]	- [x]	- [ ]	Certificate of Origin, Link Checker, Lint Checker, Checkstyle	https://github.com/opensearch-project/opensearch-java/issues/378
OpenSearch-JS	- [x]	- [x]	- [ ]	Link Checker, License, License Header, Changelog Verifier	https://github.com/opensearch-project/opensearch-js/issues/390
OpenSearch Net	- [x]	- [x]	- [ ]	Certificate of Origin, Link Checker, Deploy Documentation, License Headers, Changelog Verifier	https://github.com/opensearch-project/opensearch-net/issues/157
OpenSearch PHP	- [x]	- [x]	- [ ]	Link Checker, Changelog Verifier, Update Documentation	https://github.com/opensearch-project/opensearch-php/issues/127
OpenSearch Py	- [x]	- [x]	- [ ]	Certificate of Origin, Link Checker, License Headers, Changelog, Deploy Doc	https://github.com/opensearch-project/opensearch-py/issues/304
OpenSearch Py-ML	- [x]	- [x]	- [ ]	Certificate of Origin, Deploy Doc	https://github.com/opensearch-project/opensearch-py-ml/issues/88
OpenSearch-rs	- [x]	- [x]	- [ ]	Changelog verifier, Link Checker, Clippy Check, Certificate of Origin	https://github.com/opensearch-project/opensearch-rs/issues/127
OpenSearch Ruby	- [x]	- [x]	- [ ]	Link Checker, License Headers, Publish Docs	https://github.com/opensearch-project/opensearch-ruby/issues/148
Opensearch-SDK-Java	- [ ]	- [ ]	- [x]	Validate Gradle Wrapper	https://github.com/opensearch-project/opensearch-sdk-java/issues/470
OUI	- [x]	- [ ]	- [ ]	Certificate Checker, Lint checker	https://github.com/opensearch-project/oui/issues/331
Performance Analyzer	- [x]	- [x]	- [x]	Task List Checker, Link Checker, Certificate of Origin, Create Doc Issue	https://github.com/opensearch-project/performance-analyzer/issues/391
Performance Analyzer RCA	- [x]	- [x]	- [ ]	Link Checker, Gauntlets Test, Certificate of Origin, Create Doc Issue	https://github.com/opensearch-project/performance-analyzer-rca/issues/298
Reporting	- [x]	- [x]	- [ ]	Certificate of Origin, Link Checker	https://github.com/opensearch-project/reporting/issues/661
Search Processor	- [x]	- [x]	- [x]	Certificate of Origin, Link Checker, Create Documentation Issue	https://github.com/opensearch-project/search-processor/issues/104
Security	- [x]	- [x]	- [x]	Code Hygiene, Certificate of Origin, Plugin install	https://github.com/opensearch-project/security/issues/2449
Security Analytics	- [x]	- [x]	- [x]	Certificate of Origin, Link Checker, Security Test, Documentation Request, Benchmarking Tool	https://github.com/opensearch-project/security-analytics/issues/365
Security Analytics Dashboards	- [x]	- [x]	- [ ]	Certificate of Origin	https://github.com/opensearch-project/security-analytics-dashboards-plugin/issues/462
Security Dashboards	- [x]	- [x]	- [ ]	Code Hygiene, Certificate of Origin, Plugin install	https://github.com/opensearch-project/security-dashboards-plugin/issues/1337
Simple Schema	- [x]	- [x]	- [ ]	Link checker, Certificate of Origin	https://github.com/opensearch-project/simple-schema/issues/71
SQL	- [x]	- [x]	- [x]	Checkstyle, Jacoco (100% coverage required), Comparison Tests, Link checker, Certificate of Origin	https://github.com/opensearch-project/sql/issues/1360

stephen-crawford commented 1 year ago

Follow-up from: SDK, SQL, Search Relevance, Dashboards Anywhere, OpenSearch-js, OpenSearch, OpenSearch Net, k-NN, Neural Search, OpenSearch Dashboards, Anomaly Detection, Benchmark, Build, helm-charts, opensearch-dashboards-functional-tests, OUI, ansible, Search Processor

Repositories Filed: 60

peternied commented 1 year ago

What do the outcome of this proposal look like, is it a process, a tool, or a product feature? How would we weight passible testing requirements from great ones? If we had those requirements and executed on them how would the project be better for it?

stephen-crawford commented 1 year ago

Hi @peternied, thank you for following up. The purpose of this discussion is to finalize a set of testing requirements that all development repositories will be required to maintain. This will be in the form of guidelines on the Developer Guide that state that repositories must require "x, y, and z" to have their code merged. Likewise, changes for the repositories will be expected to have those tests pass before they are merged. The hope is that this post will garner feedback (either here or directly) on what requirements people would like to see and then we can go from there.

As things stand there is a very wide range of requirements for testing across the organization. If Repository A requires code coverage of 85% and runs unit, integration, and backwards compatibility tests, they may expect Repository B does the same. However, because there is no standardized bar, Repository B instead has 98% code coverage but only has unit and integration tests. This causes a non-standard quality bar that creates a poor release experience and has resulted in features being released that don't meet user expectations.

You bring up a really good point in what the requirement could look like. The plan is to include it in documentation and require that the development repositories implement the requirements. That being said, a tool or GitHub workflow the checked the repository states could be helpful in keeping everyone accountable. The idea is not to be an auditor for repositories but instead to help ensure we all are on the same page. I spoke with quite a few people about what they would like to see different with releases and the quality of the changes was a recurring topic.

davidlago commented 1 year ago

Thanks @scrawfor99 for driving this. To summarize, but also to make sure I understand:

The problem is uneven quality bar across the repositories in our organization The goal is to come up with a set of global, cross-org recommendations for what that bar looks like The process is first analyzing what the state of the world is now, use that to inform the discussion for what that bar should be, and then write that up into a global CONTRIBUTING.md doc that projects can reference, potentially adding their own tweaks on top. After that bar is set, we can also think of providing automations/QoL improvements.

Does this capture what you're thinking too?

stephen-crawford commented 1 year ago

Hi @davidlago, you are exactly correct. I am a bit busy this week but am hoping to be able to file the same type of documenting issue for all repositories and then make a small, easy to digest analysis. After seeing where things are I will take further steps along the lines of my response to Peter and what you mentioned in your comment.

joshuali925 commented 1 year ago

Hi @scrawfor99, I see BWC confirmation being added to dashboards plugin repo issues. Do we have BWC for dashboards plugins? I checked https://github.com/opensearch-project/opensearch-plugins/blob/main/TESTING.md#backwards-compatibility-testing but it seems only for opensearch plugins.

stephen-crawford commented 1 year ago

Hi @joshuali925, thank you for following up on this discussion. You are correct. To my knowledge there are no BWC tests for dashboards-plugins. The section in the table for BWC can effectively be ignored for dashboards plugins since it seems unlikely that that would ever be an expectation. It is not clear what backwards compatibility would entail for the frontend focused repos. It is in the table for consistency and readability.

stephen-crawford commented 1 year ago

Statistics based off of findings

This table provides the presence metrics for each of the tests documented in this process.

Test	Count (max is 59 overall/32 for backend only)	Percentage	Not Present In
Unit	57	97%	Hadoop?, Ansible Playbook
Integration	56	95%	OUI, Hadoop?, Ansible Playbook
BWC	22	69%	Simple Schema, Performance Analyzer RCA, OpenSearch-rs OpenSearch Py-ML, OpenSearch-CLI, Hadoop, Helm Charts, Geospatial, Data Prepper, Common Utils, Ansible
Certificate of Origin	41	68%	Opensearch-SDK-Java, OpenSearch Ruby, OpenSearch PHP, OpenSearch JS, OpenSearch GO, OpenSearch Dashboards Functional Test, OpenSearch CLI, OpenSearch CI, OpenSearch Benchmark, Notifications, ML-Commons Dashboards, Helm Charts, Dashboards Notifications, Dashboards Maps, Dashboards Anywhere, Ansible
Checkstyle, Lint Check, or Code Hygiene	16	27%	Too many to list
Link Checker	33	56%	Too many to list
Create Documentation Issue	12	20%	Too many to list
Deploy Documentation	3	1%	Too many to list

opensearch-project / .github