opensearch-project / OpenSearch

🔎 Open source distributed and RESTful search engine.
https://opensearch.org/docs/latest/opensearch/index/
Apache License 2.0
9.8k stars 1.82k forks source link

OpenSearch's Unreliable Dependency - A Call for Enhanced API Compatibility #8982

Closed peternied closed 7 months ago

peternied commented 1 year ago

This issue aims to shed light on a frustrating experience that is negatively impacting contributors, including myself. The intention is not to shame or punish but rather to address the problem constructively and foster a conversation for a potential solution.

Describe the bug

OpenSearch's main does not have any requirements for public API compatibility. This often causes changes on snapshot builds that break compilation of plugins or integration tests of plugins to fail. This issues are difficult to debug and consumer considerable time on its plugins.

Breaking changes are backported in previous major version numbers, these layers make it hard to know where new issues are coming from.

To Reproduce

Maintain a plugin that depends on OpenSearch.

Expected behavior

There are several options:

Keep the Status Quo:

Strictly Follow SemVer:

Increase Verification of Pull Requests:

Additional context

davidlago commented 1 year ago
peternied commented 1 year ago

Recent change that impacted all plugins:

peternied commented 1 year ago

@nknize I know you are at work with refactoring components in core [1], the breaking changes being pushed out to plugin teams are painful. Is there a quantification of how far along that work is / for how much time plugin teams will have broken builds due to these changes?

nknize commented 1 year ago

Hey @peternied. I missed this message before posting my comment on the PR to detect compatibility impact.

Is there a quantification of how far along that work is / for how much time plugin teams will have broken builds due to these changes?

I'd like to target Phase 0 of #8110 to be complete no later than 2.11.

Of course that could slip, however, depending on the downstream surface area impacted once module-info.java is added to the core modules. That effort will likely bias towards minimal exports, which means downstreams will be temporarily cut off from some classes they currently extend. We'll take each breakage on a case by case basis and bias towards exporting, but quickly redesigning what needs to be redesigned to properly enforce semver.

In the meantime any mechanisms we can create to surface downstream impact early will help us assist downstreams in quickly opening PRs to change namespaces. Perhaps someone could put together a slackbot to post to a #build channel?

peternied commented 1 year ago

@nknize Its good to hear we have a near-term target for those changes.

Building off of this - do we know if there is a list of planned breaking changes we need before we think we can target a semver safe API?

peternied commented 1 year ago

@nknize Do you know who would be able to help assemble that list of breaking changes that should be expected?

peternied commented 1 year ago

Five days ago downstream pull requests [1] were needed to handle the task foundaction refactor [2] This change was backported today, and presumably will require another set of changes to be backported.

peternied commented 1 year ago

Capturing this proposal on how we could change the consumption model within OpenSearch Project

How does apache foundation handle cases like this?

It's varies by project. I can only speak for Lucene and Solr (and Elasticsearch, and even OpenSearch). Solr (and Elasticsearch, and OpenSearch core) absorbs upstream changes "on demand" the same way we do it here with Lucene. This is what I have been suggesting for a while.

Instead of upstream core OpenSearch repo force feeding per-merge SNAPSHOT artifacts to all downstreams, the downstreams should manually (or on some automated, e.g., weekly, cron period) trigger a snapshot build and update their ${opensearch_version} in build.gradle to their own created sha (e.g., 2.10.0-SNAPSHOT-4373c3b). This would keep downstream builds stable. The unpleasant side affect would depend on how often those snapshots are updated, but that would be under the control of the downstream repo community. It's not perfect (breaking changes never are) but least then the downstream controls their own pain. This happens a lot on Solr, OpenSearch, Elasticsearch when large changes are made to lucene (e.g., JPMS, gradle change from ant, Endianness changes)

From https://github.com/opensearch-project/OpenSearch/pull/9367#issuecomment-1679673558

nknize commented 1 year ago

Follow up comment:

I wonder if we could invent a "toggle" mechanism to allow downstreams to switch between the two approaches at will? Maybe there's a GHA to enable/disable "force fed" snapshots? This could prove useful during large scale refactors to clot the bleeding? Downstreams cut over to the manual snapshot consumption model, and switch back when the change storm has passed?

dblock commented 1 year ago

@peternied I think there are two major shortcomings with this proposal, all while it does solve the problem for plugins to depend on a more stable core. It actually resembles the ODFE state where plugins depended on a previously released version of core, except that every plugin will be able to depend on some random snapshot build. That did not work well.

  1. In order to produce a working distribution build all plugins must depend on one specific snapshot build of OpenSearch - so how do you see us having any such build during the development cycle? The opensearch-build distribution builds everything from the top, from source. It cannot work without all plugins being at the same snapshot version, which will become very rare.
  2. What will prevent a plugin to simply never integrate with a newer version of core SNAPSHOT until the very last moment? this means that downstream impact of a large change is deferred to the very latest, and closest to release (aka worst) moment. This will likely slip release dates.

Maybe the problem is the breaking changes? Instead of making those with downstream impact that we're feeling, I would have first extracted a public API/SDK, declared that as a public API, and then migrated all plugins to that (similar to what we proposed with extensions, but in core).

peternied commented 1 year ago

Thanks @dblock and @nknize for your thought

...every plugin will be able to depend on some random snapshot build. That did not work well.

That is a significant reason not to engage in this proposal. It would be different if changes were waiting out in 3.0.0 for a significant period, but instead they are moving into the 2.X line as fast as a day later.

Maybe the problem is the breaking changes?

If there were no breaking changes at java's public API level - this wouldn't be an issue.

extracted a public API/SDK, declared that as a public API

This sounds like it would be very useful to signal to plugins what contracts are safe, and which are going to be in flux. Would it be worthwhile to invest in, do we know a number of how many expect breaking changes are left around the refactor - that would help weight investment for myself if it made this difference between avoiding a number of breaking changes.

peternied commented 1 year ago

RE: CompatibilityCheck: Good news, the compatibility check is using the correct commits for the evaluation, you can confirm by looking at the SHA on the message.