opensearch-project / OpenSearch

🔎 Open source distributed and RESTful search engine.
https://opensearch.org/docs/latest/opensearch/index/
Apache License 2.0
9.29k stars 1.71k forks source link

[BUG] 1.x changes can break builds on main #2350

Open andrross opened 2 years ago

andrross commented 2 years ago

The OpenSearch build process (i.e. ./gradlew check) runs backward compatibility tests against the previous major version. For example, the main branch is currently building version 2.0, so the bwc tests check out the latest commit on the 1.x branch to build the opensearch engine artifacts to run tests that create a cluster with mixed versions and ensure compatibility. (Details on the branching strategy can be found here). There have been two recent examples (#2143 opensearch-project/OpenSearch#2334) where a change committed to 1.x caused tests to break on main. I think the basic problem is that 1.x must maintain forward compatibility with main, but there are not automated tests that verify such compatibility as a part of the PR process. This means breaking changes can get committed and result in a bit of scramble to track down the root cause. How can we catch these failures as a part of the 1.x PR process before merging?

dblock commented 2 years ago

Is there a set of tests that can be run on branches (everywhere except main) that can catch these on PR?

andrross commented 2 years ago

Using terms from the branching doc, I think the next minor version needs to run the backward compatibility tests of the next major version. That would mean today that 1.x PRs would run the bwc tests from main.

However, a common cause of this problem is that a new field is added to a transport protocol message on main, with the guard that it only expects to see that field when talking to another main version server. That feature is then backported to 1.x, which means that the guard in the main code needs to change the version check since the 1.x version now uses the field. Essentially the backport requires 2 commits to 2 separate branches, so the automated checks would likely always fail in this case.

peternied commented 6 months ago

Tests that make sure we are backward compatible are useful and should be enforced, these inherently rely on different branches of OpenSearch. Broadly speaking, this is a moving target.

Contributors who make a change are only responsible for the state of the world when they created their PR. There are two cases that come to mind that encroach on this 1) a version bump happens or 2) another PR is merged with a BWC incompatible change. If one of these cases occur the contributor needs to diagnose the source of the failure, wait for the fix to be merged, then merge from main.


It looks like we might have some of the hooks to enable this behavior or something close to it. Maybe the gradle check workflow can be updated to mitigate this case, @andrross can you help find someone to investigate this more deeply?

 * We use a time based approach to make the bwc versions built deterministic and compatible with the current hash.
 * Most of the time we want to test against latest, but when running delayed exhaustive tests or wanting
 * reproducible builds we want this to be deterministic by using a hash that was the latest when the current
 * commit was made.

https://github.com/opensearch-project/OpenSearch/blob/87ac37460c16a5b3cfa1cd85cad2cb7468b430a3/buildSrc/src/main/java/org/opensearch/gradle/internal/InternalBwcGitPlugin.java#L177-L178

andrross commented 5 months ago

@peternied Thanks! This looks promising. I did some basic testing and created an issue to discuss specifically enabling this: opensearch-project/opensearch-build-libraries#420

rishabh6788 commented 1 month ago

The feature to add bwc.checkout.align=true parameter to gradle-check jobs triggered by pull_requests have been added and running in production.

peternied commented 1 month ago

@rishabh6788 Have we been able to confirm that PRs haven't needed to rebase when a version update is rolled out in another branch?

rishabh6788 commented 1 month ago

I believe we will be able to confirm this once 2.16 branch is cut from 2.x release and bumped to 2.17. The 2.16 branch cut sholud happen by the end of this week.

peternied commented 1 month ago

Cool, once its confirmed, lets close this issue - thanks @rishabh6788