Closed sandervandegeijn closed 3 months ago
@sandervandegeijn Did you get a chance to dig into this? Might be a 2.16 showstopper.
Any suggestions on how to provide more useful info? The only thing I need to do is to create a new index with the codec and reindex the data. As soon as I hit send the node crashes.
Dockerfile with which we extend the base image with the Azure plugin (sorry, we use S3 on the other cluster, this one uses Azure blob storage).
ARG osversion
FROM opensearchproject/opensearch:${osversion}
#RUN /usr/share/opensearch/bin/opensearch-plugin install --batch repository-s3
RUN /usr/share/opensearch/bin/opensearch-plugin install --batch repository-azure
It looks like a CI/CD packaging / path problem to me at first glance.
I hit another thing while testing the compression codecs, using zlib actually increases my index size vs no codec specified. Still investigating that one.
Hi @sandervandegeijn ,
Would you give us more details on It looks like a CI/CD packaging / path problem to me at first glance
?
I am a bit confused here as this seems like a code issue related to the core.
Could you test again on the tarball artifact? Since docker release is basically running the tarball.
Also sync @reta @sarthakaggarwal97 into the discussion since this looks like a opensearch-project/custom-codecs issue.
Thanks.
I was not able to reproduce on the default distribution of 2.15.
$ curl -k -u admin:$OPENSEARCH_PASSWORD -X GET https://localhost:9200/
{
"name" : "02b751c49f0f",
"cluster_name" : "docker-cluster",
"cluster_uuid" : "HA6t8-RuQQmMmWFn1SnTOw",
"version" : {
"distribution" : "opensearch",
"number" : "2.15.0",
"build_type" : "tar",
"build_hash" : "61dbcd0795c9bfe9b81e5762175414bc38bbcadf",
"build_date" : "2024-06-20T03:27:32.562036890Z",
"build_snapshot" : false,
"lucene_version" : "9.10.0",
"minimum_wire_compatibility_version" : "7.10.0",
"minimum_index_compatibility_version" : "7.0.0"
},
"tagline" : "The OpenSearch Project: https://opensearch.org/"
}
$ curl -k -u admin:$OPENSEARCH_PASSWORD -X PUT https://localhost:9200/my_index --json '{
"settings": {
"index": {
"codec": "qat_deflate"
}
}
}'
{"acknowledged":true,"shards_acknowledged":true,"index":"my_index"}
$ curl -k -u admin:$OPENSEARCH_PASSWORD -X POST https://localhost:9200/_reindex --json '{
"source": {
"index": "my_index"
},
"dest": {
"index": "their_index"
}
}'
{"took":6,"timed_out":false,"total":0,"updated":0,"created":0,"deleted":0,"batches":0,"version_conflicts":0,"noops":0,"retries":{"bulk":0,"search":0},"throttled_millis":0,"requests_per_second":-1.0,"throttled_until_millis":0,"failures":[]}
The reason I suspect a packaging problem is the error:
libqatzip.so.3: cannot open shared object file: No such file or directory
Looks like the lib is missing or in the wrong path.
@dblock have you tried doing a reindex to that newly created index? That's when the error occurs on my cluster.
Could it otherwise be that you have the lib on your system and it's linking dynamically?
@sandervandegeijn yes, sorry, I forgot to copy-paste the last part, works on my machine
In 2.15 is does not throw the error of not supporting the codec but it does crash the node.
Should we also look at ways to disable this codec, while we figure out a way to actually get a fix in place. Moreover, since this codec is anyway not working since 2.14, if we can throw 4xx instead to prevent crash?
libqatzip.so.3
Let me do some testings directly on the docker release image.
Thanks.
I tried to add a document to my index and it crashed the node.
$ curl -k -u admin:$OPENSEARCH_PASSWORD -X POST https://localhost:9200/my_index/_doc --json '{"x":1}'
opensearch-cluster-1 | [2024-07-25T20:58:59,634][INFO ][o.o.p.PluginsService ] [02b751c49f0f] PluginService:onIndexModule index:[my_index/cuvrKg3HReadlbWVxZfxvA]
opensearch-cluster-1 | [2024-07-25T20:58:59,637][INFO ][o.o.c.m.MetadataMappingService] [02b751c49f0f] [my_index/cuvrKg3HReadlbWVxZfxvA] create_mapping
opensearch-cluster-1 | [2024-07-25T20:58:59,660][INFO ][o.o.p.PluginsService ] [02b751c49f0f] PluginService:onIndexModule index:[security-auditlog-2024.07.25/nPokmBqBTDqMlbw8Anxscg]
opensearch-cluster-1 | [2024-07-25T20:58:59,668][INFO ][o.o.c.m.MetadataMappingService] [02b751c49f0f] [security-auditlog-2024.07.25/nPokmBqBTDqMlbw8Anxscg] update_mapping [_doc]
opensearch-cluster-1 | [2024-07-25T20:58:59,664][ERROR][o.o.b.OpenSearchUncaughtExceptionHandler] [02b751c49f0f] fatal error in thread [opensearch[02b751c49f0f][write][T#4]], exiting
opensearch-cluster-1 | java.lang.ExceptionInInitializerError: null
opensearch-cluster-1 | at com.intel.qat.QatZipper.<clinit>(QatZipper.java:97) ~[?:?]
opensearch-cluster-1 | at org.opensearch.index.codec.customcodecs.QatZipperFactory.createInstance(QatZipperFactory.java:34) ~[?:?]
opensearch-cluster-1 | at org.opensearch.index.codec.customcodecs.QatZipperFactory.createInstance(QatZipperFactory.java:166) ~[?:?]
opensearch-cluster-1 | at org.opensearch.index.codec.customcodecs.QatCompressionMode$QatCompressor.<init>(QatCompressionMode.java:96) ~[?:?]
opensearch-cluster-1 | at org.opensearch.index.codec.customcodecs.QatCompressionMode.newCompressor(QatCompressionMode.java:75) ~[?:?]
opensearch-cluster-1 | at org.apache.lucene.codecs.lucene90.compressing.Lucene90CompressingStoredFieldsWriter.<init>(Lucene90CompressingStoredFieldsWriter.java:118) ~[lucene-core-9.10.0.jar:9.10.0 695c0ac84508438302cd346a812cfa2fdc5a10df - 2024-02-14 16:48:06]
opensearch-cluster-1 | at org.apache.lucene.codecs.lucene90.compressing.Lucene90CompressingStoredFieldsFormat.fieldsWriter(Lucene90CompressingStoredFieldsFormat.java:140) ~[lucene-core-9.10.0.jar:9.10.0 695c0ac84508438302cd346a812cfa2fdc5a10df - 2024-02-14 16:48:06]
opensearch-cluster-1 | at org.opensearch.index.codec.customcodecs.Lucene99QatStoredFieldsFormat.fieldsWriter(Lucene99QatStoredFieldsFormat.java:124) ~[?:?]
opensearch-cluster-1 | at org.apache.lucene.index.StoredFieldsConsumer.initStoredFieldsWriter(StoredFieldsConsumer.java:50) ~[lucene-core-9.10.0.jar:9.10.0 695c0ac84508438302cd346a812cfa2fdc5a10df - 2024-02-14 16:48:06]
opensearch-cluster-1 | at org.apache.lucene.index.StoredFieldsConsumer.startDocument(StoredFieldsConsumer.java:57) ~[lucene-core-9.10.0.jar:9.10.0 695c0ac84508438302cd346a812cfa2fdc5a10df - 2024-02-14 16:48:06]
opensearch-cluster-1 | at org.apache.lucene.index.IndexingChain.startStoredFields(IndexingChain.java:535) ~[lucene-core-9.10.0.jar:9.10.0 695c0ac84508438302cd346a812cfa2fdc5a10df - 2024-02-14 16:48:06]
opensearch-cluster-1 | at org.apache.lucene.index.IndexingChain.processDocument(IndexingChain.java:566) ~[lucene-core-9.10.0.jar:9.10.0 695c0ac84508438302cd346a812cfa2fdc5a10df - 2024-02-14 16:48:06]
opensearch-cluster-1 | at org.apache.lucene.index.DocumentsWriterPerThread.updateDocuments(DocumentsWriterPerThread.java:263) ~[lucene-core-9.10.0.jar:9.10.0 695c0ac84508438302cd346a812cfa2fdc5a10df - 2024-02-14 16:48:06]
opensearch-cluster-1 | at org.apache.lucene.index.DocumentsWriter.updateDocuments(DocumentsWriter.java:425) ~[lucene-core-9.10.0.jar:9.10.0 695c0ac84508438302cd346a812cfa2fdc5a10df - 2024-02-14 16:48:06]
opensearch-cluster-1 | at org.apache.lucene.index.IndexWriter.updateDocuments(IndexWriter.java:1558) ~[lucene-core-9.10.0.jar:9.10.0 695c0ac84508438302cd346a812cfa2fdc5a10df - 2024-02-14 16:48:06]
opensearch-cluster-1 | at org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:1843) ~[lucene-core-9.10.0.jar:9.10.0 695c0ac84508438302cd346a812cfa2fdc5a10df - 2024-02-14 16:48:06]
opensearch-cluster-1 | at org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:1483) ~[lucene-core-9.10.0.jar:9.10.0 695c0ac84508438302cd346a812cfa2fdc5a10df - 2024-02-14 16:48:06]
opensearch-cluster-1 | at org.opensearch.index.engine.InternalEngine.addDocs(InternalEngine.java:1281) ~[opensearch-2.15.0.jar:2.15.0]
opensearch-cluster-1 | at org.opensearch.index.engine.InternalEngine.indexIntoLucene(InternalEngine.java:1217) ~[opensearch-2.15.0.jar:2.15.0]
opensearch-cluster-1 | at org.opensearch.index.engine.InternalEngine.index(InternalEngine.java:1011) ~[opensearch-2.15.0.jar:2.15.0]
opensearch-cluster-1 | at org.opensearch.index.shard.IndexShard.index(IndexShard.java:1215) ~[opensearch-2.15.0.jar:2.15.0]
opensearch-cluster-1 | at org.opensearch.index.shard.IndexShard.applyIndexOperation(IndexShard.java:1160) ~[opensearch-2.15.0.jar:2.15.0]
opensearch-cluster-1 | at org.opensearch.index.shard.IndexShard.applyIndexOperationOnPrimary(IndexShard.java:1051) ~[opensearch-2.15.0.jar:2.15.0]
opensearch-cluster-1 | at org.opensearch.action.bulk.TransportShardBulkAction.executeBulkItemRequest(TransportShardBulkAction.java:625) ~[opensearch-2.15.0.jar:2.15.0]
opensearch-cluster-1 | at org.opensearch.action.bulk.TransportShardBulkAction$2.doRun(TransportShardBulkAction.java:471) ~[opensearch-2.15.0.jar:2.15.0]
opensearch-cluster-1 | at org.opensearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:941) ~[opensearch-2.15.0.jar:2.15.0]
opensearch-cluster-1 | at org.opensearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:52) ~[opensearch-2.15.0.jar:2.15.0]
opensearch-cluster-1 | at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144) ~[?:?]
opensearch-cluster-1 | at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642) ~[?:?]
opensearch-cluster-1 | at java.base/java.lang.Thread.run(Thread.java:1583) [?:?]
opensearch-cluster-1 | Caused by: java.lang.UnsupportedOperationException: Unsupported OS/arch, cannot find /com/intel/qat/linux/aarch64/libqat-java.so. Please try building from source.
opensearch-cluster-1 | at com.intel.qat.Native.loadLibrary(Native.java:68) ~[?:?]
opensearch-cluster-1 | at com.intel.qat.InternalJNI.<clinit>(InternalJNI.java:17) ~[?:?]
opensearch-cluster-1 | ... 30 more
opensearch-cluster-1 | fatal error in thread [opensearch[02b751c49f0f][write][T#4]], exiting
opensearch-cluster-1 | java.lang.ExceptionInInitializerError
opensearch-cluster-1 | at com.intel.qat.QatZipper.<clinit>(QatZipper.java:97)
opensearch-cluster-1 | at org.opensearch.index.codec.customcodecs.QatZipperFactory.createInstance(QatZipperFactory.java:34)
opensearch-cluster-1 | at org.opensearch.index.codec.customcodecs.QatZipperFactory.createInstance(QatZipperFactory.java:166)
opensearch-cluster-1 | at org.opensearch.index.codec.customcodecs.QatCompressionMode$QatCompressor.<init>(QatCompressionMode.java:96)
opensearch-cluster-1 | at org.opensearch.index.codec.customcodecs.QatCompressionMode.newCompressor(QatCompressionMode.java:75)
opensearch-cluster-1 | at org.apache.lucene.codecs.lucene90.compressing.Lucene90CompressingStoredFieldsWriter.<init>(Lucene90CompressingStoredFieldsWriter.java:118)
opensearch-cluster-1 | at org.apache.lucene.codecs.lucene90.compressing.Lucene90CompressingStoredFieldsFormat.fieldsWriter(Lucene90CompressingStoredFieldsFormat.java:140)
opensearch-cluster-1 | at org.opensearch.index.codec.customcodecs.Lucene99QatStoredFieldsFormat.fieldsWriter(Lucene99QatStoredFieldsFormat.java:124)
opensearch-cluster-1 | at org.apache.lucene.index.StoredFieldsConsumer.initStoredFieldsWriter(StoredFieldsConsumer.java:50)
opensearch-cluster-1 | at org.apache.lucene.index.StoredFieldsConsumer.startDocument(StoredFieldsConsumer.java:57)
opensearch-cluster-1 | at org.apache.lucene.index.IndexingChain.startStoredFields(IndexingChain.java:535)
opensearch-cluster-1 | at org.apache.lucene.index.IndexingChain.processDocument(IndexingChain.java:566)
opensearch-cluster-1 | at org.apache.lucene.index.DocumentsWriterPerThread.updateDocuments(DocumentsWriterPerThread.java:263)
opensearch-cluster-1 | at org.apache.lucene.index.DocumentsWriter.updateDocuments(DocumentsWriter.java:425)
opensearch-cluster-1 | at org.apache.lucene.index.IndexWriter.updateDocuments(IndexWriter.java:1558)
opensearch-cluster-1 | at org.apache.lucene.index.IndexWriter.updateDocument(IndexWriter.java:1843)
opensearch-cluster-1 | at org.apache.lucene.index.IndexWriter.addDocument(IndexWriter.java:1483)
opensearch-cluster-1 | at org.opensearch.index.engine.InternalEngine.addDocs(InternalEngine.java:1281)
opensearch-cluster-1 | at org.opensearch.index.engine.InternalEngine.indexIntoLucene(InternalEngine.java:1217)
opensearch-cluster-1 | at org.opensearch.index.engine.InternalEngine.index(InternalEngine.java:1011)
opensearch-cluster-1 | at org.opensearch.index.shard.IndexShard.index(IndexShard.java:1215)
opensearch-cluster-1 | at org.opensearch.index.shard.IndexShard.applyIndexOperation(IndexShard.java:1160)
opensearch-cluster-1 | at org.opensearch.index.shard.IndexShard.applyIndexOperationOnPrimary(IndexShard.java:1051)
opensearch-cluster-1 | at org.opensearch.action.bulk.TransportShardBulkAction.executeBulkItemRequest(TransportShardBulkAction.java:625)
opensearch-cluster-1 | at org.opensearch.action.bulk.TransportShardBulkAction$2.doRun(TransportShardBulkAction.java:471)
opensearch-cluster-1 | at org.opensearch.common.util.concurrent.ThreadContext$ContextPreservingAbstractRunnable.doRun(ThreadContext.java:941)
opensearch-cluster-1 | at org.opensearch.common.util.concurrent.AbstractRunnable.run(AbstractRunnable.java:52)
opensearch-cluster-1 | at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144)
opensearch-cluster-1 | at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642)
opensearch-cluster-1 | at java.base/java.lang.Thread.run(Thread.java:1583)
opensearch-cluster-1 | Caused by: java.lang.UnsupportedOperationException: Unsupported OS/arch, cannot find /com/intel/qat/linux/aarch64/libqat-java.so. Please try building from source.
opensearch-cluster-1 | at com.intel.qat.Native.loadLibrary(Native.java:68)
opensearch-cluster-1 | at com.intel.qat.InternalJNI.<clinit>(InternalJNI.java:17)
opensearch-cluster-1 | ... 30 more
opensearch-cluster-1 | Caused by: java.lang.UnsupportedOperationException: Unsupported OS/arch, cannot find /com/intel/qat/linux/aarch64/libqat-java.so. Please try building from source.
That seems like a different issue: opensearch-cluster-1 | Caused by: java.lang.UnsupportedOperationException: Unsupported OS/arch, cannot find /com/intel/qat/linux/aarch64/libqat-java.so. Please try building from source.
opensearch-cluster-1 | Caused by: java.lang.UnsupportedOperationException: Unsupported OS/arch, cannot find /com/intel/qat/linux/aarch64/libqat-java.so. Please try building from source.
That seems like a different issue:
opensearch-cluster-1 | Caused by: java.lang.UnsupportedOperationException: Unsupported OS/arch, cannot find /com/intel/qat/linux/aarch64/libqat-java.so. Please try building from source
opensearch-cluster-1 | Caused by: java.lang.UnsupportedOperationException: Unsupported OS/arch, cannot find /com/intel/qat/linux/aarch64/libqat-java.so. Please try building from source.
That seems like a different issue:
opensearch-cluster-1 | Caused by: java.lang.UnsupportedOperationException: Unsupported OS/arch, cannot find /com/intel/qat/linux/aarch64/libqat-java.so. Please try building from source
Confirmed this one on a Mac with Apple Silicon running the image under docker desktop as well. Didn't expect that error either, from the docs: it should fall back to the software implementation instead of the hardware accelerated one, but it should still work.
Seperate bug?
I just tried using the docker images and I dont see the issue as well:
% docker run -it -d -p 9200:9200 -p 9600:9600 -e "discovery.type=single-node" -e "OPENSEARCH_INITIAL_ADMIN_PASSWORD=$OPENSEARCH_PASSWORD" opensearchproject/opensearch:2.15.0
3b3758158c66e63686ab22613cdbf78b1567ad1125105bb48532db0a26265ff6
% docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
3b3758158c66 opensearchproject/opensearch:2.15.0 "./opensearch-docker…" 2 seconds ago Up 1 second 0.0.0.0:9200->9200/tcp, :::9200->9200/tcp, 9300/tcp, 0.0.0.0:9600->9600/tcp, :::9600->9600/tcp, 9650/tcp adoring_hoover
% curl -k -u admin:$OPENSEARCH_PASSWORD -X PUT https://localhost:9200/my_index --json '{
"settings": {
"index": {
"codec": "qat_deflate"
}
}
}'
{"acknowledged":true,"shards_acknowledged":true,"index":"my_index"}%
% curl -k -u admin:$OPENSEARCH_PASSWORD -X POST https://localhost:9200/_reindex --json '{
"source": {
"index": "my_index"
},
"dest": {
"index": "their_index"
}
}'
{"took":13,"timed_out":false,"total":0,"updated":0,"created":0,"deleted":0,"batches":0,"version_conflicts":0,"noops":0,"retries":{"bulk":0,"search":0},"throttled_millis":0,"requests_per_second":-1.0,"throttled_until_millis":0,"failures":[]}%
opensearch-cluster-1 | Caused by: java.lang.UnsupportedOperationException: Unsupported OS/arch, cannot find /com/intel/qat/linux/aarch64/libqat-java.so. Please try building from source.
That seems like a different issue:
opensearch-cluster-1 | Caused by: java.lang.UnsupportedOperationException: Unsupported OS/arch, cannot find /com/intel/qat/linux/aarch64/libqat-java.so. Please try building from source
Confirmed this one on a Mac with Apple Silicon running the image under docker desktop as well. Didn't expect that error either, from the docs: it should fall back to the software implementation instead of the hardware accelerated one, but it should still work.
Seperate bug?
Yeah that probably is a separate issue because in custom-codecs we see this
[opensearch@3b3758158c66 test]$ unzip qat-java-1.1.1.jar
Archive: qat-java-1.1.1.jar
creating: META-INF/
inflating: META-INF/MANIFEST.MF
creating: com/
creating: com/intel/
creating: com/intel/qat/
creating: com/intel/qat/linux/
creating: com/intel/qat/linux/amd64/
creating: META-INF/maven/
creating: META-INF/maven/com.intel.qat/
creating: META-INF/maven/com.intel.qat/qat-java/
inflating: com/intel/qat/Native.class
inflating: com/intel/qat/QatZipper$Mode.class
inflating: com/intel/qat/QatException.class
inflating: com/intel/qat/QatZipper$QatCleaner.class
inflating: com/intel/qat/QatDecompressorInputStream.class
inflating: com/intel/qat/package-info.class
inflating: com/intel/qat/QatCompressorOutputStream.class
inflating: com/intel/qat/InternalJNI.class
inflating: com/intel/qat/QatZipper.class
inflating: com/intel/qat/QatZipper$PollingMode.class
inflating: com/intel/qat/QatZipper$Algorithm.class
inflating: com/intel/qat/linux/amd64/libqat-java.so
inflating: META-INF/maven/com.intel.qat/qat-java/pom.xml
inflating: META-INF/maven/com.intel.qat/qat-java/pom.properties
inflating: module-info.class
It seems like it does not support arm64 at this point.
Related to this PR in custom-codecs repo
Transfer to there and adding @reta @sarthakaggarwal97 @andrross to take a look. Thanks.
opensearch-cluster-1 | Caused by: java.lang.UnsupportedOperationException: Unsupported OS/arch, cannot find /com/intel/qat/linux/aarch64/libqat-java.so. Please try building from source.
That seems like a different issue:
opensearch-cluster-1 | Caused by: java.lang.UnsupportedOperationException: Unsupported OS/arch, cannot find /com/intel/qat/linux/aarch64/libqat-java.so. Please try building from source
Confirmed this one on a Mac with Apple Silicon running the image under docker desktop as well. Didn't expect that error either, from the docs: it should fall back to the software implementation instead of the hardware accelerated one, but it should still work. Seperate bug?
Yeah that probably is a separate issue because in custom-codecs we see this
[opensearch@3b3758158c66 test]$ unzip qat-java-1.1.1.jar Archive: qat-java-1.1.1.jar creating: META-INF/ inflating: META-INF/MANIFEST.MF creating: com/ creating: com/intel/ creating: com/intel/qat/ creating: com/intel/qat/linux/ creating: com/intel/qat/linux/amd64/ creating: META-INF/maven/ creating: META-INF/maven/com.intel.qat/ creating: META-INF/maven/com.intel.qat/qat-java/ inflating: com/intel/qat/Native.class inflating: com/intel/qat/QatZipper$Mode.class inflating: com/intel/qat/QatException.class inflating: com/intel/qat/QatZipper$QatCleaner.class inflating: com/intel/qat/QatDecompressorInputStream.class inflating: com/intel/qat/package-info.class inflating: com/intel/qat/QatCompressorOutputStream.class inflating: com/intel/qat/InternalJNI.class inflating: com/intel/qat/QatZipper.class inflating: com/intel/qat/QatZipper$PollingMode.class inflating: com/intel/qat/QatZipper$Algorithm.class inflating: com/intel/qat/linux/amd64/libqat-java.so inflating: META-INF/maven/com.intel.qat/qat-java/pom.xml inflating: META-INF/maven/com.intel.qat/qat-java/pom.properties inflating: module-info.class
It seems like it does not support arm64 at this point.
Still it should fall back to a pure software implementation right? Should I open a seperate issue in the custom-codecs repo?
The cluster runs on: Intel(R) Xeon(R) Gold 6242R CPU @ 3.10GHz so that should work.
These look like similar issues...the QAT codecs do not gracefully handle the case where it cannot be loaded (either due to incompatible hardware or missing library).
I would propose the following options for the 2.16 release:
@andrross thanks for sharing the options. Do we know if any existing users can successfully upgrade their clusters with indices on the QAT codec without encountering failures, and will it work seamlessly? If yes, then can we move forward with a one or combination of 2 and 3 above where enablement for new indices can be prevented?
I just tried using the docker images and I dont see the issue as well:
Make sure to have a document in the index.
curl -k -u admin:$OPENSEARCH_PASSWORD -X POST https://localhost:9200/my_index/_doc --json '{"x":1}'
@andrross Why isn't there a catch all in IndexWriter
, maybe initStoredFieldsWriter
on failure to initialize any of these codecs?
Able to reproduce:
fatal error in thread [opensearch[039c85055a8f][write][T#4]], exiting
java.lang.UnsatisfiedLinkError: /tmp/opensearch-18442301159217191627/libqat-java8257943536944747477.so: libqatzip.so.3: cannot open shared object file: No such file or directory
Do we know if any existing users can successfully upgrade their clusters with indices on the QAT codec without encountering failures, and will it work seamlessly?
If the qat codec is unavailable, I doubt the shards will be green. The users can change the codec before upgrading (force merge to 1 segment, so that all the segment has a old / stable codec), and then upgrade.
Add a setting to opt-in to use these new codecs, but make them unavailable by default
I think this is what we wanted always, but we couldn't come up with the consensus on how to mark these codecs experimental. Discussions over here https://github.com/opensearch-project/OpenSearch/pull/13992
I'm good with 3rd option, if we have can come up with a mitigation plan to fix the codecs. If we do not see that happening soon, I will vote for 2nd option.
@andrross Why isn't there a catch all in
IndexWriter
, maybeinitStoredFieldsWriter
on failure to initialize any of these codecs?
@dblock These are java.lang.Error
instances, not exceptions. It is generally unsafe to catch Errors as it usually indicates that the JVM is not able to continue operating properly. I suspect the right solution here is to introspect at runtime whether these codecs are available, otherwise don't register them.
adding @mulugetam to the discussion (contributor for QAT codec)
Hi @sarthakaggarwal97 @mulugetam ,
Do we know if user needs to explicitly install libqatzip.so.3
on their machine?
In our docker image we also do not have this lib it seems, and I cant seem to find it from any existing repositories.
If so, regardless of which option we need to add it to the docker image.
Please let me know the installation of these or you would include it with custom-codecs plugin.
Thanks.
@andrross Why isn't there a catch all in
IndexWriter
, maybeinitStoredFieldsWriter
on failure to initialize any of these codecs?@dblock These are
java.lang.Error
instances, not exceptions. It is generally unsafe to catch Errors as it usually indicates that the JVM is not able to continue operating properly. I suspect the right solution here is to introspect at runtime whether these codecs are available, otherwise don't register them.
But this is java.lang.UnsatisfiedLinkError
. An explicit loadLibrary
for codecs along with a catch
for this should be ok, no?
1. Fix the issue so that the service gracefully fails with a proper error message at index creation time if the codec is not supported. (probably not feasible to do this right in the short time that we have, but happy to be proven wrong)
I agree, we shouldn't return the codecs if they are not supported in that platform.
3. Add a setting to opt-in to use these new codecs, but make them unavailable by default (this is a breaking change but gives a work-around to re-enable them. any user is still at risk of a node crash if they enable this setting in the wrong environment)
Not a requirement for 2.16. But, we should implement this sooner, today the installation of custom-codes gets all the codecs in it causing issues like this.
But this is java.lang.UnsatisfiedLinkError. An explicit loadLibrary for codecs along with a catch for this should be ok, no?
@dblock Yes, @sarthakaggarwal97 has implemented something like this. But I don't think we should have a general catch-all in the core layer.
One thing to note here is that I believe from reading the EC2 documentation that only the metal
sizes of the m7i, r7i, and c7i instance types have the QAT hardware acceleration. We don't use those instance sizes in any of our testing infrastructure (to my knowledge) so I don't think we're actually testing this codec anywhere in practice. Some of the documentation suggests that this should fallback to software acceleration but in my test the isQatAvailable
always seems to evaluate to false, even on a stock Ubuntu EC2 instance using an intel processor.
Fixed by #169. Closing.
Thanks guys
Describe the bug
Trying to use the qat_deflate compression. According to the docs it should be there from 2.14 on. This is not correct btw, in 2.14.0 it can't be used, gives an instant error. Created a PR for the docs for that one.
In 2.15 is does not throw the error of not supporting the codec but it does crash the node.
Related component
Storage
To Reproduce
Then reindex:
Expected behavior
Do not crash
Additional Details
Base 2.15.0 docker image with the s3 plugin installed.
Log: