opensearch-project / OpenSearch

🔎 Open source distributed and RESTful search engine.
https://opensearch.org/docs/latest/opensearch/index/
Apache License 2.0
9.06k stars 1.67k forks source link

[Feature Request] Support for Experimental Codecs #13723

Open sarthakaggarwal97 opened 1 month ago

sarthakaggarwal97 commented 1 month ago

Is your feature request related to a problem? Please describe

With the introduction of new custom codecs, it gets essential to bake the new codecs before marking them ready for production. This helps us to avoid potential index corruption risks, and allows for the community feedback to flow in before we make the codecs as generally available.

Earlier, we used to have sandbox for codecs OpenSearch, with the introduction of the separate repository for custom codecs, it gets important to have a similar functionality available in the codecs plugin as well.

Describe the solution you'd like

We can extend the existing CodecSettings interface to know whether the codec is experimental upon index creation.

Related component

Indexing

Describe alternatives you've considered

No response

Additional context

Coming from https://github.com/opensearch-project/custom-codecs/issues/148

andrross commented 1 month ago

@sarthakaggarwal97 Could the custom-codecs plugin itself create an register a static setting like codecs.experimental.enabled and only create and add the new codec if that setting is explicitly set to true?

sarthakaggarwal97 commented 1 month ago

@andrross Do you mean a static cluster setting? It would be like a feature flag right. Please correct me if I am wrong.

We might have to utilize this interface so that we can mark codecs as experimental. Another benefit of an interface change is that it gives a way for other plugins as well to use it. We have plugins like Security Analytics, K-nn who also currently maintain their own codecs, and might have similar use cases in the future.

I feel it will be a combination of both the setting and a clean way to mark the new codecs as experimental. We can then have assertions in place to block the create index call.

andrross commented 1 month ago

@sarthakaggarwal97 Yeah it would functionally work the same way as a feature flag, but just be scoped to the custom-codecs plugin in this case. I think in order to determine exactly how we implement this feature we'll want to clearly define the user experience. If we extend the CodecSettings interface, how do we implement the behavior? What's a system admin's experience for enabling and disabling experimental codecs?

sarthakaggarwal97 commented 1 month ago

@andrross since the validation to select the codec happens in EngineConfig, we might need to implement the feature flag within OpenSearch itself. CodecSettings interface would help us know whether the particular codec is experimental or not, while the feature flag would make experimental codecs available via CodecService.

Tagging @reta @mgodwan @backslasht for inputs.

reta commented 1 month ago

@andrross since the validation to select the codec happens in EngineConfig, we might need to implement the feature flag within OpenSearch itself. CodecSettings interface would help us know whether the particular codec is experimental or not, while the feature flag would make experimental codecs available via CodecService.

I would :100: agree with @andrross that experimental codecs should be contained in custom-codecs plugin, including the experimental support. The CodecSettings is optional and we should not expect any codec to implement it (vs Apache Lucene's Codec which is mandatory).

sarthakaggarwal97 commented 1 month ago

I guess we should be able to have the setting in custom-codecs. We might have to still expose the experimental via the CodecSettings interface. If we would not validate the experimental nature of codecs in the EngineConfig, and the codecs are not available in CodecService (because they are experimental), the shards would fail.

Apologies if I am missing something here, please correct me if I am wrong.

reta commented 1 month ago

Apologies if I am missing something here, please correct me if I am wrong.

I would encourage to not rely on CodecSettings anyhow (for this specific feature)

andrross commented 1 month ago

If we would not validate the experimental nature of codecs in the EngineConfig, and the codecs are not available in CodecService (because they are experimental), the shards would fail.

@sarthakaggarwal97 Can you elaborate a bit on what you mean here? I get that there are sharp edges on the user experience (i.e. if you disable an experimental codec then any existing indexes using that codec will fail), but that is sort of expected for these opt-in experimental features until they are stabilized and fully supported. To me, the simplest option here is to just let custom-codecs choose whether to register experimental codecs at startup based on the existence of a static setting. What about that implementation doesn't work with the experience you have in mind?

sarthakaggarwal97 commented 1 month ago

@andrross in EngineConfig when the validation happens upon index creation, it loads the codec from NamedSPI itself. The codecs are registered with NamedSPI through resources.

Is it possible to load a different resource file based on an experimental settings?

reta commented 1 month ago

Is it possible to load a different resource file based on an experimental settings?

It is not possible, the service loader mechanism does not support that (afaik)

sarthakaggarwal97 commented 1 month ago

yeah, even I am not aware. @andrross @reta what do you guys suggest in this case?

reta commented 1 month ago

@andrross @reta what do you guys suggest in this case?

To me, the most logical way to move forward is to contain the change in custom-codecs (https://github.com/opensearch-project/custom-codecs/issues/148): it provides own CodecService (CustomCodecService) and can filter out experimental codecs (driven by plugin specific settings).

sarthakaggarwal97 commented 1 month ago

The challenge would be make the experimental codecs not available in the index.codec settings validations.

reta commented 1 month ago

The challenge would be make the experimental codecs not available in the index.codec settings validations.

Not sure I get it: If experimental codec is not enabled (in custom-codecs), any attempts to configure it (including validating settings) would fail (since this is managed by our own CodecSettings)

sarthakaggarwal97 commented 1 month ago

If the codecs are registered by NamedSPI, they would be available over here

reta commented 1 month ago

If the codecs are registered by NamedSPI, they would be available over here

Same answer as for CodecSettings, the CodecAliases is ours and any attempts to use it (including validating settings) would fail. We cannot do this over any out of the box Apache Lucene codecs but we could do that for any codec in custom-codecs plugin (some code changes may be need of cause)

peternied commented 1 month ago

[Triage - attendees 1 2 3 4 5 6 7] @sarthakaggarwal97 Thanks for creating this issue