Open Jakob3xD opened 6 days ago
Previous issue linked to https://github.com/opensearch-project/security/issues/3771 which was supposed to have been fixed, but there's at least one other report of this error in 2.14 here: https://github.com/opensearch-project/security/issues/3771#issuecomment-2136941571 CC: @cwperks
Possibly is related to https://github.com/opensearch-project/performance-analyzer/issues/606
Still-open PR to fix: https://github.com/opensearch-project/performance-analyzer/pull/609
I'm seeing the same issue upgrading from 2.12 to 2.14
@Jakob3xD It would be helpful for debugging to understand the setup that produced this error. What plugins or features are enabled?
When this error was seen on upgrade from < 2.11 to 2.11, it was because PerformanceAnalyzer had wrapped an instance of a TransportChannel and was not delegating getVersion()
implementation to the wrapped channel. I opened a PR in PA to fix the wrapping issue, but also opened a different PR in the security repo that fixed the issue in the security repo and did not require the PA PR to be merged (though it should merge since its the strategic fix)
Backwards compatibility for serialization requires the receiving or transmitting node of a transport request to know the correct version of the target node. The target node being the node that the transport request is being sent to (from the transmitter end) or the node that sent the transport request (on the receiver end)
It would be helpful to know the concrete className for connection
here: https://github.com/opensearch-project/security/blob/9caf5cbaa90c2ccbaf8b0bf00ee8bb1fe9326919/src/main/java/org/opensearch/security/transport/SecurityInterceptor.java#L155
It would also be helpful the know the concrete className for channel
here: https://github.com/opensearch-project/security/blob/9caf5cbaa90c2ccbaf8b0bf00ee8bb1fe9326919/src/main/java/org/opensearch/security/ssl/transport/SecuritySSLRequestHandler.java#L96
We're also seeing this upgrading from 2.12.0
to 2.14.0
. We have the PA plugin installed but it is disabled via env variable DISABLE_PERFORMANCE_ANALYZER_AGENT_CLI=true
. Note, @Jakob3xD didn't seem to have the opensearch-performance-analyzer
plugin but still had this error.
opensearch-alerting
opensearch-anomaly-detection
opensearch-asynchronous-search
opensearch-cross-cluster-replication
opensearch-custom-codecs
opensearch-flow-framework
opensearch-geospatial
opensearch-index-management
opensearch-job-scheduler
opensearch-knn
opensearch-ml
opensearch-neural-search
opensearch-notifications
opensearch-notifications-core
opensearch-observability
opensearch-performance-analyzer
opensearch-reports-scheduler
opensearch-security
opensearch-security-analytics
opensearch-skills
opensearch-sql
repository-s3
PA is disabled at startup:
Enabling OpenSearch Security Plugin
Disabling execution of install_demo_configuration.sh for OpenSearch Security Plugin
Disabling execution of /usr/share/opensearch/bin/opensearch-performance-analyzer/performance-analyzer-agent-cli for OpenSearch Performance Analyzer Plugin
error does reference performanceanalyzer.transport.PerformanceAnalyzerTransportRequestHandler.messageReceived
:
{"type": "console", "timestamp": "2024-06-24T23:42:24,719Z", "level": "WARN", "component": "o.o.i.e.Engine", "cluster.name": "some-cluster", "node.name": "node-1", "message": " [some-index-2024-06-24][1] failed engine [primary shard [[some-index-2024-06-24][1], node[asdf], [P], s[STARTED], a[id=fdsa]] was demoted while failing replica shard]", "cluster.uuid": "ffff-A", "node.id": "asdf" ,
"stacktrace": ["org.opensearch.OpenSearchException: java.lang.IllegalArgumentException: -84 is not a valid id",
"at org.opensearch.security.support.Base64CustomHelper.deserializeObject(Base64CustomHelper.java:136) ~[?:?]",
"at org.opensearch.security.support.Base64Helper.deserializeObject(Base64Helper.java:46) ~[?:?]",
"at org.opensearch.security.auditlog.impl.AbstractAuditLog.getUser(AbstractAuditLog.java:775) ~[?:?]",
"at org.opensearch.security.auditlog.impl.AbstractAuditLog.logMissingPrivileges(AbstractAuditLog.java:209) ~[?:?]",
"at org.opensearch.security.auditlog.impl.AuditLogImpl.logMissingPrivileges(AuditLogImpl.java:164) ~[?:?]",
"at org.opensearch.security.auditlog.AuditLogSslExceptionHandler.logError(AuditLogSslExceptionHandler.java:72) ~[?:?]",
"at org.opensearch.security.ssl.transport.SecuritySSLRequestHandler.messageReceived(SecuritySSLRequestHandler.java:170) ~[?:?]",
"at org.opensearch.security.OpenSearchSecurityPlugin$6$1.messageReceived(OpenSearchSecurityPlugin.java:795) ~[?:?]",
"at org.opensearch.indexmanagement.rollup.interceptor.RollupInterceptor$interceptHandler$1.messageReceived(RollupInterceptor.kt:114) ~[?:?]",
"at org.opensearch.performanceanalyzer.transport.PerformanceAnalyzerTransportRequestHandler.messageReceived(PerformanceAnalyzerTransportRequestHandler.java:43) ~[?:?]",
"at org.opensearch.transport.RequestHandlerRegistry.processMessageReceived(RequestHandlerRegistry.java:106) ~[opensearch-2.14.0.jar:2.14.0]",
"at org.opensearch.transport.InboundHandler.handleRequest(InboundHandler.java:271) ~[opensearch-2.14.0.jar:2.14.0]",
"at org.opensearch.transport.InboundHandler.messageReceived(InboundHandler.java:144) ~[opensearch-2.14.0.jar:2.14.0]",
"at org.opensearch.transport.InboundHandler.inboundMessage(InboundHandler.java:127) ~[opensearch-2.14.0.jar:2.14.0]",
"at org.opensearch.transport.TcpTransport.inboundMessage(TcpTransport.java:770) ~[opensearch-2.14.0.jar:2.14.0]",
"at org.opensearch.transport.InboundPipeline.forwardFragments(InboundPipeline.java:175) ~[opensearch-2.14.0.jar:2.14.0]",
"at org.opensearch.transport.InboundPipeline.doHandleBytes(InboundPipeline.java:150) [opensearch-2.14.0.jar:2.14.0]",
"at org.opensearch.transport.InboundPipeline.handleBytes(InboundPipeline.java:115) [opensearch-2.14.0.jar:2.14.0]",
"at org.opensearch.transport.netty4.Netty4MessageChannelHandler.channelRead(Netty4MessageChannelHandler.java:95) [transport-netty4-client-2.14.0.jar:2.14.0]",
"at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:442) [netty-transport-4.1.109.Final.jar:4.1.109.Final]",
"at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420) [netty-transport-4.1.109.Final.jar:4.1.109.Final]",
"at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:412) [netty-transport-4.1.109.Final.jar:4.1.109.Final]",
"at io.netty.handler.logging.LoggingHandler.channelRead(LoggingHandler.java:280) [netty-handler-4.1.109.Final.jar:4.1.109.Final]",
"at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:442) ~[netty-transport-4.1.109.Final.jar:4.1.109.Final]",
"at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420) ~[netty-transport-4.1.109.Final.jar:4.1.109.Final]",
"at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:412) ~[netty-transport-4.1.109.Final.jar:4.1.109.Final]",
"at io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:103) ~[netty-codec-4.1.109.Final.jar:4.1.109.Final]",
"at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:444) [netty-transport-4.1.109.Final.jar:4.1.109.Final]",
"at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420) [netty-transport-4.1.109.Final.jar:4.1.109.Final]",
"at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:412) [netty-transport-4.1.109.Final.jar:4.1.109.Final]",
"at io.netty.handler.ssl.SslHandler.unwrap(SslHandler.java:1475) [netty-handler-4.1.109.Final.jar:4.1.109.Final]",
"at io.netty.handler.ssl.SslHandler.decodeJdkCompatible(SslHandler.java:1338) [netty-handler-4.1.109.Final.jar:4.1.109.Final]",
"at io.netty.handler.ssl.SslHandler.decode(SslHandler.java:1387) [netty-handler-4.1.109.Final.jar:4.1.109.Final]",
"at io.netty.handler.codec.ByteToMessageDecoder.decodeRemovalReentryProtection(ByteToMessageDecoder.java:529) [netty-codec-4.1.109.Final.jar:4.1.109.Final]",
"at io.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:468) [netty-codec-4.1.109.Final.jar:4.1.109.Final]",
"at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:290) [netty-codec-4.1.109.Final.jar:4.1.109.Final]",
"at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:444) [netty-transport-4.1.109.Final.jar:4.1.109.Final]",
"at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420) [netty-transport-4.1.109.Final.jar:4.1.109.Final]",
"at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:412) [netty-transport-4.1.109.Final.jar:4.1.109.Final]",
"at io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1410) [netty-transport-4.1.109.Final.jar:4.1.109.Final]",
"at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:440) [netty-transport-4.1.109.Final.jar:4.1.109.Final]",
"at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420) [netty-transport-4.1.109.Final.jar:4.1.109.Final]",
"at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:919) [netty-transport-4.1.109.Final.jar:4.1.109.Final]",
"at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:166) [netty-transport-4.1.109.Final.jar:4.1.109.Final]",
"at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:788) [netty-transport-4.1.109.Final.jar:4.1.109.Final]",
"at io.netty.channel.nio.NioEventLoop.processSelectedKeysPlain(NioEventLoop.java:689) [netty-transport-4.1.109.Final.jar:4.1.109.Final]",
"at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:652) [netty-transport-4.1.109.Final.jar:4.1.109.Final]",
"at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:562) [netty-transport-4.1.109.Final.jar:4.1.109.Final]",
"at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:997) [netty-common-4.1.109.Final.jar:4.1.109.Final]",
"at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74) [netty-common-4.1.109.Final.jar:4.1.109.Final]",
"at java.lang.Thread.run(Thread.java:1583) [?:?]",
"Caused by: java.lang.IllegalArgumentException: -84 is not a valid id",
"at org.opensearch.security.support.Base64CustomHelper$CustomSerializationFormat.fromId(Base64CustomHelper.java:63) ~[?:?]",
"at org.opensearch.security.support.Base64CustomHelper.deserializeObject(Base64CustomHelper.java:117) ~[?:?]",
"... 50 more"] }
@Jakob3xD It would be helpful for debugging to understand the setup that produced this error. What plugins or features are enabled?
The plugins are listed in the issue. PA is completely removed during the image build process via /usr/share/opensearch/bin/opensearch-plugin remove opensearch-performance-analyzer
. The only difference to my other clusters is the cluster configuration of remote clusters and the configuration of an awareness attribute.
To give more details on how i encountered the bug. I upgraded all my remote clusters from 2.13 to 2.14 without any issues and afterwards I wanted to upgrade the main clusters, where all the remote targets are configured. First two node reboots worked without any issue but after the third node thinks started to fail with the already named exception. I fixed this issue like the last time where the upgrade to 2.11 failed, by doing a full cluster upgrade and not the rolling upgrade. After doing the full cluster upgrade by shutting the cluster down and starting it with 2.14 the exceptions are gone.
It would be helpful to know the concrete className for
connection
here: https://github.com/opensearch-project/security/blob/9caf5cbaa90c2ccbaf8b0bf00ee8bb1fe9326919/src/main/java/org/opensearch/security/transport/SecurityInterceptor.java#L155It would also be helpful the know the concrete className for
channel
here: https://github.com/opensearch-project/security/blob/9caf5cbaa90c2ccbaf8b0bf00ee8bb1fe9326919/src/main/java/org/opensearch/security/ssl/transport/SecuritySSLRequestHandler.java#L96
Not sure how I would get those? I am not familiar with java.
Based on the stack trace it appears the issue is this:
2.12 node ------- transport request ------> 2.14 node
When the 2.12 node is serializing its opting to use JDK serialization, but the 2.14 node is trying to deserialize as custom.
The 2.12 node should be opting to use custom serialization since the target node is >= 2.11: https://github.com/opensearch-project/security/blob/2.12/src/main/java/org/opensearch/security/transport/SecurityInterceptor.java#L153
@briend Do you also have cross cluster configured?
@Jakob3xD Is the error seen on the replica cluster?
@briend Do you also have cross cluster configured?
No, we don't have cross cluster search configured on the cluster that had the issue. I was also able to recover by doing a full upgrade instead of rolling (shut all nodes down and start up with 2.14.0
).
@Jakob3xD Is the error seen on the replica cluster?
Nope, only happened on the main cluster. And they are not replica clusters. They are just configures as remote cluster to be able to query all logs via one opensearch.
I could not reproduce the error with the security plugin disabled, FWIW
Describe the bug
With the rolling upgrade from 2.13.0 to 2.14 one of my Opensearch clusters starts to fail and throwing exceptions. After restarting the third of six hardware nodes and doing the upgrade of 2.13 to 2.14 some indices went from yellow to red caused by shard failures with the following exception:
Related component
Other
To Reproduce
Not know. The only big difference to my other clusters is, that the failing cluster is doing cross cluster search and therefore has remote targets configured.
Expected behavior
The upgrade from 2.13 to 2.14 should happen without such exceptions.
Additional Details
Plugins
Host/Environment (please complete the following information):
Additional context Similar exception already existed in the past when updating from 2.10 to 2.11. https://github.com/opensearch-project/OpenSearch/issues/11491