opensearch-project / custom-codecs

OpenSearch custom lucene codecs for providing different on-disk index encoding (e.g., compression).
Apache License 2.0
6 stars 13 forks source link

Add hardware-accelerated codecs for DEFLATE and LZ4 #122

Closed mulugetam closed 1 month ago

mulugetam commented 3 months ago

Description

Adds hardware-accelerated DEFLATE and LZ4 compression codecs for stored fields. The hardware in focus here is Intel (R) QAT, which is an integrated, built-in accelerator on the latest 4th and 5th Gen Intel Xeon processors. The implementation relies on the Qat-Java library.

The PR adds two additional valid values for index.codec: qat_deflate and qat_lz4. It also introduces a new setting, index.codec.qatmode, that specifies the mode of execution for QAT.

Two values are supported for index.codec.qatmode: hardware and auto. A hardware execution mode uses only the QAT hardware, while an auto execution mode may switch to software if hardware resources are not available.

Closes

https://github.com/opensearch-project/custom-codecs/issues/130

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license. For more information on following Developer Certificate of Origin and signing off your commits, please check here.

sarthakaggarwal97 commented 3 months ago

@mulugetam thanks for raising the PR. Could you please share some performance numbers for these modes?

asonje commented 3 months ago

@mulugetam thanks for raising the PR. Could you please share some performance numbers for these modes?

Here are some performance numbers for indexing using stack overflow workload <html xmlns:v="urn:schemas-microsoft-com:vml" xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:x="urn:schemas-microsoft-com:office:excel" xmlns="http://www.w3.org/TR/REC-html40">

  | `qat_deflate` relative to `deflate` | `qat_lz4` relative to `default` -- | -- | -- Total time | -11.4% | -2.5% Mean Indexing throughput | 23.7% | 3.4% Store Size | 1.6% | -1.76%

mulerm commented 3 months ago

Thank you @asonje. @sarthakaggarwal97 we will also share the performance numbers for search when they're ready.

sarthakaggarwal97 commented 3 months ago

thanks @mulugetam @asonje for initial numbers. Out of curiosity, if the underlying algorithm is still same (in this case lz4, zlib), how are we seeing differences in store size?

asonje commented 3 months ago

Out of curiosity, if the underlying algorithm is still same (in this case lz4, zlib), how are we seeing differences in store size?

This is expected. As you know the store size will vary from run to run but it is still a good approximation of compression ratio. qat_deflate and deflate will have different compression ratios because even though the algorithm is the same, their implementations are different. The resulting compressed bytes will not be identical in both cases even though they are compatible.

reta commented 3 months ago

@mulugetam it looks pretty cool, could you please share what arch/oses it is available? (the arch part is somewhat clear, but not arch + os combinations, windows / linux / intel macs, ...)

mulerm commented 3 months ago

@mulugetam it looks pretty cool, could you please share what arch/oses it is available? (the arch part is somewhat clear, but not arch + os combinations, windows / linux / intel macs, ...)

The QAT built-in accelerator is available on 4th and 5th gen Intel (R) Xeon Processors. This version requires amd64/Linux. For all other systems, the auto mode can be used to do a software-only compression.

reta commented 3 months ago

@mulugetam when you have chance, could you please resolve the conflicts? thank you

mulugetam commented 3 months ago

@mulugetam when you have chance, could you please resolve the conflicts? thank you

Looks like it's asking me to insert new lines because existing code is not spotless formatted.

reta commented 3 months ago

LGTM to me, I have nothing else, thank you @mulugetam ! Coudl you please create a documentation issue for 2.14 to document this new codec, in particular have these topics covered:

Thank you.

[1] https://github.com/opensearch-project/documentation-website

mulugetam commented 3 months ago

@reta PR for documentation: https://github.com/opensearch-project/documentation-website/pull/6841. Thanks!

sarthakaggarwal97 commented 3 months ago

@mulugetam have you tried running a long running tests with both QAT codecs? It would be good to have long running tests to foresee any corruption or potential memory leaks. In these tests, we can keep the indexing / updates going on, with optional search requests (which also accesses stored fields). We can have a single data node configuration, and an index with multiple shards (so that we don't hit 2B document limits early on)

sarthakaggarwal97 commented 3 months ago

@asonje

As you know the store size will vary from run to run but it is still a good approximation of compression ratio.

Did we do a force merge to one segment before we captured the store size. If not, we can do it to know the approximate impact of QAT on store size.

Also, Would you please help with the decompression perf numbers as well with top_hits aggregation query. This query is infamous for pointing out regressions over stored fields

asonje commented 3 months ago

Did we do a force merge to one segment before we captured the store size. If not, we can do it to know the approximate impact of QAT on store size.

Not initially. I have re-run the tests with force merge and max_num_segments=1 and updated the table

mulugetam commented 3 months ago

@mulugetam have you tried running a long running tests with both QAT codecs?

@sarthakaggarwal97 We have run OpenSearch-Benchmark multiple times, with both qat_deflate and qat_lz4, on a four-data-node setup, and have seen no issues. How long is the duration you are suggesting we run?

mgodwan commented 2 months ago

@mulugetam have you tried running a long running tests with both QAT codecs? It would be good to have long running tests to foresee any corruption or potential memory leaks. In these tests, we can keep the indexing / updates going on, with optional search requests (which also accesses stored fields). We can have a single data node configuration, and an index with multiple shards (so that we don't hit 2B document limits early on)

@sarthakaggarwal97 Can we contribute the recommended search query for such testing to https://github.com/opensearch-project/opensearch-benchmark-workloads/ It should ensure that any new feature which impacts stored fields can get tested without additional effort.

sarthakaggarwal97 commented 2 months ago

@mulugetam During the memory leak for Zstd, we observed a significant change in system memory (not JVM core execution is native) within a few hours. Maybe we can do a nightly run to see if we have any such affects? I am open to your opinion on this, just thought I'll highlight this so that we don't miss it!

mulugetam commented 2 months ago

@sarthakaggarwal97 @reta Thank you for bringing the issue of memory leaks to our attention! We have been running a single-node OpenSearch process for hours while monitoring the system memory utilization. We do not see a significant difference in system memory utilization between best_compression vs qat_deflate (same for lucene_default and qat_lz4). The below screenshot, for example, shows the %memused when multiple cycles of stackoverflow dataset are submitted for indexing.

best_compression_vs_qat_deflate

Also, and more importantly, we have been testing the qat-java library using valgrind and other static code analysis tools and have not seen memory leaks.

mulerm commented 1 month ago

@reta are we waiting for additional approvals?

reta commented 1 month ago

@reta are we waiting for additional approvals?

Yes, would be good to have @sarthakaggarwal97 signoff , thank you @mulugetam !

mgodwan commented 1 month ago

@mulerm @sarthakaggarwal97 @reta

I've a few suggestions on this PR, and future changes in new custom codecs:

  1. Before enabling these as generally available, can we introduce a feature flag which can allow us to mark new codecs as experimental? Each new codec can implement an isExperimental method, and there can be a boolean feature flag opensearch.experimental.feature.experimental_codecs.enabled which users can choose to enable if they would like to try this out?

  2. Based on the PR my understanding is that the storage format does not change. I think in future, codec like lz4_qat should be inferred, and continue to work with other implementation of LZ4 in the example case. We should add tests to ensure that the storage remains compatible always if we're adding changes which impact only compute (e.g. use lz4_qat for compression, and lz4 for decompression)

sarthakaggarwal97 commented 1 month ago

@mgodwan thanks for raising these points. I agree, we should have a experimental feature flag to let the new codecs bake for some time before making them available by default.

@mulugetam other than that, one minor comment to check upon range of the compression levels.

reta commented 1 month ago
  1. Before enabling these as generally available, can we introduce a feature flag which can allow us to mark new codecs as experimental? Each new codec can implement an isExperimental method, and there can be a boolean feature flag opensearch.experimental.feature.experimental_codecs.enabled which users can choose to enable if they would like to try this out?

@mgodwan one of the reason why we have a custom-codecs plugin is to roll out experimental / stable codecs on opt-in fashion. The feature flags are specifically targeting OpenSearch core, because it is not really feasible to alter the flows in any other way, in contrast plugin could be uninstalled completely to phase out functionality. I think we should not introduce any feature flags for plugins.

mgodwan commented 1 month ago

in contrast plugin could be uninstalled completely to phase out functionality

Uninstalling plugin means not being able to read segment data written using the codec introduced. Given the codec leads to different way to store the data, we need to ensure backwards compatibility is maintained in any new future release on how we deal with persisted format/metadata. While this is an opt-in through plugin, not having this as experimental forces us to do any changes in a backward compatible fashion where as keeping it experimental for say a couple of release cycles will allow us to see how it is being adopted and allow to make necessary changes without worrying about breaking existing production data for users (as experimental features add that warning by design).

Hence, I was proposing to add the feature flag as suggested (The feature flag itself can lie in core if we follow the interface to allow for an implementation of codec to be an experimental through an interface).

reta commented 1 month ago

Hence, I was proposing to add the feature flag as suggested (The feature flag itself can lie in core if we follow the interface to allow for an implementation of codec to be an experimental through an interface).

Any plugin has to be backwards compatibility irrespectively of feature flags, if we are not sure that the plugin is stable enough - we should document it as experimental one, so people will use it knowing things could break. Feature flags are not applicable to plugins, however to your point, we could provide fine grained control over enabling / disabling a particular codec, this could be done using plugin settings.

mgodwan commented 1 month ago

I'm inclined towards having a forcing function in code rather than relying on the documentation to mark as experimental. If this can be achieved using plugin settings, that should be fine as well. At the end, feature flags themselves are modeled as settings only :)

reta commented 1 month ago

I'm inclined towards having a forcing function in code rather than relying on the documentation to mark as experimental.

We need both I believe, the settings could be introduced as a separate change (to cover existing ZSTD as well), please create an issue so someone could pick it up, thank you.

mulugetam commented 1 month ago

2. Based on the PR my understanding is that the storage format does not change. I think in future, codec like lz4_qat should be inferred, and continue to work with other implementation of LZ4 in the example case. We should add tests to ensure that the storage remains compatible always if we're adding changes which impact only compute (e.g. use lz4_qat for compression, and lz4 for decompression)

Thanks. I believe some of us discussed similar ideas on Slack. An issue entry was also created: https://github.com/opensearch-project/custom-codecs/issues/130.

mulugetam commented 1 month ago

@reta do we have any pending issues that need to be resolved?

reta commented 1 month ago

@reta do we have any pending issues that need to be resolved?

@mulugetam we still need a signoff https://github.com/opensearch-project/custom-codecs/pull/122#issuecomment-2103374633

sarthakaggarwal97 commented 1 month ago

We should add tests to ensure that the storage remains compatible always if we're adding changes which impact only compute (e.g. use lz4_qat for compression, and lz4 for decompression)

@mulerm I think we can add this test to ensure the only change would be in the compute. What do you think @reta?

reta commented 1 month ago

@mulerm I think we can add this test to ensure the only change would be in the compute. What do you think @reta?

@sarthakaggarwal97 I am not sure we could pull it off, for lz4_qat we need a real hardware support, right? I don't think it is available in GA.

mulugetam commented 1 month ago

@reta @sarthakaggarwal97

Adding the suggested test in the current implementation is not sufficient and probably will not work, as the fieldsWriter and fieldsReader of Lucene store BEST_COMPRESSION and BEST_SPEED as the value of the MODE_KEY attribute.

My thinking was that we should treat qat_deflate (vs. BEST_COMPRESSION) and qat_lz4 (vs. BEST_SPEED) as separate codecs until such a time when BEST_COMPRESSION and BEST_SPEED would just use the accelerator "transparently", if it is present.

reta commented 1 month ago

My thinking was that we should treat qat_deflate (vs. BEST_COMPRESSION) and qat_lz4 (vs. BEST_SPEED) as separate codecs

Certainly +1 to that

mgodwan commented 1 month ago

I am not sure we could pull it off, for lz4_qat we need a real hardware support, right? I don't think it is available in GA.

If this is the case, I would highly advocate for this to be behind a plugin setting. https://github.com/opensearch-project/custom-codecs/issues/148

reta commented 1 month ago

If this is the case, I would highly advocate for this to be behind a plugin setting. #148

But it is separate codec already, which users have to opt-in to use (not a default one)? So users have to pick a codec AND set a setting to use it? Sounds like unreasonably complicated process to me

mgodwan commented 1 month ago

But it is separate codec already, which users have to opt-in to use (not a default one)?

So was the case when zstd was added to OpenSearch in the first iteration, but it was done via a sandbox plugin. Since the plugin is no longer sandbox, I think it would benefit to have a way to denote this is experimental and innovate with time on the settings, codec management, issue handling, etc. for these new codecs without worrying about breaking changes.

The only reason I'm saying this is that the codec impacts storage of data and in case of issues, just disabling may not bring users out of any issues unless already written data is also fixed.

That said, I'm okay with the call you and @sarthakaggarwal97 take on this.

sarthakaggarwal97 commented 1 month ago

Given the precedent we have had with Zstd, I think its okay to keep the new QAT codecs as experimental for now. Currently, there is not a clean way to mark the codecs as experimental, and I have opened an issue https://github.com/opensearch-project/OpenSearch/issues/13723.

With that, it would be nice if we can come up with a plan as well to make new codecs, here QAT, generally available in the future. Let me think back on it. One of the list I created earlier for Zstd correctness was this for reference: https://github.com/opensearch-project/OpenSearch/issues/9502

reta commented 1 month ago

Given the precedent we have had with Zstd, I think its okay to keep the new QAT codecs as experimental for now.

@sarthakaggarwal97 we have https://github.com/opensearch-project/custom-codecs/issues/148 to address the problem for every custom codec.

mulugetam commented 1 month ago

@sarthakaggarwal97 we have #148 to address the problem for every custom codec.

@reta @sarthakaggarwal97 Are we on hold now until #148 is implemented? I think we should not be, as it is a separate codec that users would have to opt in to use.

reta commented 1 month ago

@sarthakaggarwal97 we have #148 to address the problem for every custom codec.

@reta @sarthakaggarwal97 Are we on hold now until #148 is implemented? I think we should not be, as it is a separate codec that users would have to opt in to use.

@mulugetam I don't think we need to wait for #148, we could addressed that right after (before 2.15.0)

sarthakaggarwal97 commented 1 month ago

@reta I think we would need to introduce the experimental settings in OpenSearch, since we do the validation of index codecs in EngineConfig

Ideally would want to stop the creation of index only if the codecs is not experimental. I feel we would need a two changes. A new method in CodecSettings can tell us if the codec is experimental or not, and a feature flag setting will tell us whether we should make the experimental codecs available or not.

sarthakaggarwal97 commented 1 month ago

Thank you @mulugetam for this change.

opensearch-trigger-bot[bot] commented 1 month ago

The backport to 2.x failed:

The process '/usr/bin/git' failed with exit code 128

To backport manually, run these commands in your terminal:

# Navigate to the root of your repository
cd $(git rev-parse --show-toplevel)
# Fetch latest updates from GitHub
git fetch
# Create a new working tree
git worktree add ../.worktrees/custom-codecs/backport-2.x 2.x
# Navigate to the new working tree
pushd ../.worktrees/custom-codecs/backport-2.x
# Create a new branch
git switch --create backport/backport-122-to-2.x
# Cherry-pick the merged commit of this pull request and resolve the conflicts
git cherry-pick -x --mainline 1 c8b0d80a8286459857f2db2c0e9d3c1c076ada9d
# Push it to GitHub
git push --set-upstream origin backport/backport-122-to-2.x
# Go back to the original working tree
popd
# Delete the working tree
git worktree remove ../.worktrees/custom-codecs/backport-2.x

Then, create a pull request where the base branch is 2.x and the compare/head branch is backport/backport-122-to-2.x.

sarthakaggarwal97 commented 1 month ago

@mulugetam would you please help with the backport as well? Thank you

reta commented 1 month ago

@reta I think we would need to introduce the experimental settings in OpenSearch, since we do the validation of index codecs in EngineConfig

@sarthakaggarwal97 it would be great but we validation logic won't help us here I think: the codecs are registered by Apache Lucene SPI (the validation logic you are referring to only helps with ensuring the codec settings validness).

Ideally would want to stop the creation of index only if the codecs is not experimental. I feel we would need a two changes. A new method in CodecSettings can tell us if the codec is experimental or not, and a feature flag setting will tell us whether we should make the experimental codecs available or not.

That's one of the problems: we could extend CodecSettings (this is ours) but none of the Codec implementations are required to implement it. Ideally, Apache Lucene SPI should have that feature, or alternatively, we could disable some codecs through settings.

mgodwan commented 1 month ago

the codecs are registered by Apache Lucene SPI (the validation logic you are referring to only helps with ensuring the codec settings validness).

I think a check like this will definitely help to disable its usage in the write path. For write path, NamedSPI interface is not used.

mulugetam commented 1 month ago

@mulugetam would you please help with the backport as well? Thank you

@reta @sarthakaggarwal97 https://github.com/opensearch-project/custom-codecs/pull/150