opensearch-project / security

🔐 Secure your cluster with TLS, numerous authentication backends, data masking, audit logging as well as role-based access control on indices, documents, and fields
https://opensearch.org/docs/latest/security-plugin/index/
Apache License 2.0
180 stars 263 forks source link

[BUG] Unexpected failure while sending request, -84 is not a valid id #4494

Open Jakob3xD opened 6 days ago

Jakob3xD commented 6 days ago

Describe the bug

With the rolling upgrade from 2.13.0 to 2.14 one of my Opensearch clusters starts to fail and throwing exceptions. After restarting the third of six hardware nodes and doing the upgrade of 2.13 to 2.14 some indices went from yellow to red caused by shard failures with the following exception:

Jun 24 14:55:42 some-opensearch-n7[20523]: [2024-06-24T14:55:42,402][WARN ][o.o.c.a.s.ShardStateAction] [some-opensearch-n7] unexpected failure while sending request [internal:cluster/shard/failure] to [{some-opensearch-n10}{emE3d7CIR8qOR0fTQ9NE-w}{wZPLpHEfQTqwbBOJg2Sgcw}{some-opensearch-n10}{172.27.9.10:9300}{dimr}{node=some-opensearch-w4, data=hot, shard_indexing_pressure_enabled=true}] for shard entry [shard id [[.ds-some-index-000005][1]], allocation id [wPuHfO6rT02UsPrtPe0X7A], primary term [1249], message [mark copy as stale], markAsStale [true]]
Jun 24 14:55:42 some-opensearch-n7[20523]: org.opensearch.transport.RemoteTransportException: [some-opensearch-n10][172.27.9.10:9300][internal:cluster/shard/failure]
Jun 24 14:55:42 some-opensearch-n7[20523]: Caused by: org.opensearch.OpenSearchException: java.lang.IllegalArgumentException: -84 is not a valid id
Jun 24 14:55:42 some-opensearch-n7[20523]:         at org.opensearch.security.support.Base64CustomHelper.deserializeObject(Base64CustomHelper.java:136) ~[?:?]
Jun 24 14:55:42 some-opensearch-n7[20523]:         at org.opensearch.security.support.Base64Helper.deserializeObject(Base64Helper.java:46) ~[?:?]
Jun 24 14:55:42 some-opensearch-n7[20523]:         at org.opensearch.security.auditlog.impl.AbstractAuditLog.getUser(AbstractAuditLog.java:784) ~[?:?]
Jun 24 14:55:42 some-opensearch-n7[20523]:         at org.opensearch.security.auditlog.impl.AbstractAuditLog.logMissingPrivileges(AbstractAuditLog.java:218) ~[?:?]
Jun 24 14:55:42 some-opensearch-n7[20523]:         at org.opensearch.security.auditlog.impl.AuditLogImpl.logMissingPrivileges(AuditLogImpl.java:164) ~[?:?]
Jun 24 14:55:42 some-opensearch-n7[20523]:         at org.opensearch.security.auditlog.AuditLogSslExceptionHandler.logError(AuditLogSslExceptionHandler.java:72) ~[?:?]
Jun 24 14:55:42 some-opensearch-n7[20523]:         at org.opensearch.security.ssl.transport.SecuritySSLRequestHandler.messageReceived(SecuritySSLRequestHandler.java:170) ~[?:?]
Jun 24 14:55:42 some-opensearch-n7[20523]:         at org.opensearch.security.OpenSearchSecurityPlugin$6$1.messageReceived(OpenSearchSecurityPlugin.java:828) ~[?:?]
Jun 24 14:55:42 some-opensearch-n7[20523]:         at org.opensearch.indexmanagement.rollup.interceptor.RollupInterceptor$interceptHandler$1.messageReceived(RollupInterceptor.kt:114) ~[?:?]
Jun 24 14:55:42 some-opensearch-n7[20523]:         at org.opensearch.transport.RequestHandlerRegistry.processMessageReceived(RequestHandlerRegistry.java:106) ~[opensearch-2.14.0.jar:2.14.0]
Jun 24 14:55:42 some-opensearch-n7[20523]:         at org.opensearch.transport.InboundHandler.handleRequest(InboundHandler.java:271) ~[opensearch-2.14.0.jar:2.14.0]
Jun 24 14:55:42 some-opensearch-n7[20523]:         at org.opensearch.transport.InboundHandler.messageReceived(InboundHandler.java:144) ~[opensearch-2.14.0.jar:2.14.0]
Jun 24 14:55:42 some-opensearch-n7[20523]:         at org.opensearch.transport.InboundHandler.inboundMessage(InboundHandler.java:127) ~[opensearch-2.14.0.jar:2.14.0]
Jun 24 14:55:42 some-opensearch-n7[20523]:         at org.opensearch.transport.TcpTransport.inboundMessage(TcpTransport.java:770) ~[opensearch-2.14.0.jar:2.14.0]
Jun 24 14:55:42 some-opensearch-n7[20523]:         at org.opensearch.transport.InboundPipeline.forwardFragments(InboundPipeline.java:175) ~[opensearch-2.14.0.jar:2.14.0]
Jun 24 14:55:42 some-opensearch-n7[20523]:         at org.opensearch.transport.InboundPipeline.doHandleBytes(InboundPipeline.java:150) [opensearch-2.14.0.jar:2.14.0]
Jun 24 14:55:42 some-opensearch-n7[20523]:         at org.opensearch.transport.InboundPipeline.handleBytes(InboundPipeline.java:115) [opensearch-2.14.0.jar:2.14.0]
Jun 24 14:55:42 some-opensearch-n7[20523]:         at org.opensearch.transport.netty4.Netty4MessageChannelHandler.channelRead(Netty4MessageChannelHandler.java:95) [transport-netty4-client-2.14.0.jar:2.14.0]
Jun 24 14:55:42 some-opensearch-n7[20523]:         at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:442) [netty-transport-4.1.109.Final.jar:4.1.109.Final]
Jun 24 14:55:42 some-opensearch-n7[20523]:         at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420) [netty-transport-4.1.109.Final.jar:4.1.109.Final]
Jun 24 14:55:42 some-opensearch-n7[20523]:         at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:412) [netty-transport-4.1.109.Final.jar:4.1.109.Final]
Jun 24 14:55:42 some-opensearch-n7[20523]:         at io.netty.handler.logging.LoggingHandler.channelRead(LoggingHandler.java:280) [netty-handler-4.1.109.Final.jar:4.1.109.Final]
Jun 24 14:55:42 some-opensearch-n7[20523]:         at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:442) ~[netty-transport-4.1.109.Final.jar:4.1.109.Final]
Jun 24 14:55:42 some-opensearch-n7[20523]:         at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420) ~[netty-transport-4.1.109.Final.jar:4.1.109.Final]
Jun 24 14:55:42 some-opensearch-n7[20523]:         at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:412) ~[netty-transport-4.1.109.Final.jar:4.1.109.Final]
Jun 24 14:55:42 some-opensearch-n7[20523]:         at io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:103) ~[netty-codec-4.1.109.Final.jar:4.1.109.Final]
Jun 24 14:55:42 some-opensearch-n7[20523]:         at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:444) [netty-transport-4.1.109.Final.jar:4.1.109.Final]
Jun 24 14:55:42 some-opensearch-n7[20523]:         at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420) [netty-transport-4.1.109.Final.jar:4.1.109.Final]
Jun 24 14:55:42 some-opensearch-n7[20523]:         at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:412) [netty-transport-4.1.109.Final.jar:4.1.109.Final]
Jun 24 14:55:42 some-opensearch-n7[20523]:         at io.netty.handler.ssl.SslHandler.unwrap(SslHandler.java:1475) [netty-handler-4.1.109.Final.jar:4.1.109.Final]
Jun 24 14:55:42 some-opensearch-n7[20523]:         at io.netty.handler.ssl.SslHandler.decodeJdkCompatible(SslHandler.java:1338) [netty-handler-4.1.109.Final.jar:4.1.109.Final]
Jun 24 14:55:42 some-opensearch-n7[20523]:         at io.netty.handler.ssl.SslHandler.decode(SslHandler.java:1387) [netty-handler-4.1.109.Final.jar:4.1.109.Final]
Jun 24 14:55:42 some-opensearch-n7[20523]:         at io.netty.handler.codec.ByteToMessageDecoder.decodeRemovalReentryProtection(ByteToMessageDecoder.java:529) [netty-codec-4.1.109.Final.jar:4.1.109.Final]
Jun 24 14:55:42 some-opensearch-n7[20523]:         at io.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:468) [netty-codec-4.1.109.Final.jar:4.1.109.Final]
Jun 24 14:55:42 some-opensearch-n7[20523]:         at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:290) [netty-codec-4.1.109.Final.jar:4.1.109.Final]
Jun 24 14:55:42 some-opensearch-n7[20523]:         at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:444) [netty-transport-4.1.109.Final.jar:4.1.109.Final]
Jun 24 14:55:42 some-opensearch-n7[20523]:         at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420) [netty-transport-4.1.109.Final.jar:4.1.109.Final]
Jun 24 14:55:42 some-opensearch-n7[20523]:         at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:412) [netty-transport-4.1.109.Final.jar:4.1.109.Final]
Jun 24 14:55:42 some-opensearch-n7[20523]:         at io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1410) [netty-transport-4.1.109.Final.jar:4.1.109.Final]
Jun 24 14:55:42 some-opensearch-n7[20523]:         at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:440) [netty-transport-4.1.109.Final.jar:4.1.109.Final]
Jun 24 14:55:42 some-opensearch-n7[20523]:         at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420) [netty-transport-4.1.109.Final.jar:4.1.109.Final]
Jun 24 14:55:42 some-opensearch-n7[20523]:         at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:919) [netty-transport-4.1.109.Final.jar:4.1.109.Final]
Jun 24 14:55:42 some-opensearch-n7[20523]:         at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:166) [netty-transport-4.1.109.Final.jar:4.1.109.Final]
Jun 24 14:55:42 some-opensearch-n7[20523]:         at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:788) [netty-transport-4.1.109.Final.jar:4.1.109.Final]
Jun 24 14:55:42 some-opensearch-n7[20523]:         at io.netty.channel.nio.NioEventLoop.processSelectedKeysPlain(NioEventLoop.java:689) [netty-transport-4.1.109.Final.jar:4.1.109.Final]
Jun 24 14:55:42 some-opensearch-n7[20523]:         at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:652) [netty-transport-4.1.109.Final.jar:4.1.109.Final]
Jun 24 14:55:42 some-opensearch-n7[20523]:         at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:562) [netty-transport-4.1.109.Final.jar:4.1.109.Final]
Jun 24 14:55:42 some-opensearch-n7[20523]:         at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:997) [netty-common-4.1.109.Final.jar:4.1.109.Final]
Jun 24 14:55:42 some-opensearch-n7[20523]:         at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74) [netty-common-4.1.109.Final.jar:4.1.109.Final]
Jun 24 14:55:42 some-opensearch-n7[20523]:         at java.lang.Thread.run(Thread.java:1583) [?:?]
Jun 24 14:55:42 some-opensearch-n7[20523]: Caused by: java.lang.IllegalArgumentException: -84 is not a valid id
Jun 24 14:55:42 some-opensearch-n7[20523]:         at org.opensearch.security.support.Base64CustomHelper$CustomSerializationFormat.fromId(Base64CustomHelper.java:63) ~[?:?]
Jun 24 14:55:42 some-opensearch-n7[20523]:         at org.opensearch.security.support.Base64CustomHelper.deserializeObject(Base64CustomHelper.java:117) ~[?:?]
Jun 24 14:55:42 some-opensearch-n7[20523]:         at org.opensearch.security.support.Base64Helper.deserializeObject(Base64Helper.java:46) ~[?:?]
Jun 24 14:55:42 some-opensearch-n7[20523]:         at org.opensearch.security.auditlog.impl.AbstractAuditLog.getUser(AbstractAuditLog.java:784) ~[?:?]
Jun 24 14:55:42 some-opensearch-n7[20523]:         at org.opensearch.security.auditlog.impl.AbstractAuditLog.logMissingPrivileges(AbstractAuditLog.java:218) ~[?:?]
Jun 24 14:55:42 some-opensearch-n7[20523]:         at org.opensearch.security.auditlog.impl.AuditLogImpl.logMissingPrivileges(AuditLogImpl.java:164) ~[?:?]
Jun 24 14:55:42 some-opensearch-n7[20523]:         at org.opensearch.security.auditlog.AuditLogSslExceptionHandler.logError(AuditLogSslExceptionHandler.java:72) ~[?:?]
Jun 24 14:55:42 some-opensearch-n7[20523]:         at org.opensearch.security.ssl.transport.SecuritySSLRequestHandler.messageReceived(SecuritySSLRequestHandler.java:170) ~[?:?]
Jun 24 14:55:42 some-opensearch-n7[20523]:         at org.opensearch.security.OpenSearchSecurityPlugin$6$1.messageReceived(OpenSearchSecurityPlugin.java:828) ~[?:?]
Jun 24 14:55:42 some-opensearch-n7[20523]:         at org.opensearch.indexmanagement.rollup.interceptor.RollupInterceptor$interceptHandler$1.messageReceived(RollupInterceptor.kt:114) ~[?:?]
Jun 24 14:55:42 some-opensearch-n7[20523]:         at org.opensearch.transport.RequestHandlerRegistry.processMessageReceived(RequestHandlerRegistry.java:106) ~[opensearch-2.14.0.jar:2.14.0]
Jun 24 14:55:42 some-opensearch-n7[20523]:         at org.opensearch.transport.InboundHandler.handleRequest(InboundHandler.java:271) ~[opensearch-2.14.0.jar:2.14.0]
Jun 24 14:55:42 some-opensearch-n7[20523]:         at org.opensearch.transport.InboundHandler.messageReceived(InboundHandler.java:144) ~[opensearch-2.14.0.jar:2.14.0]
Jun 24 14:55:42 some-opensearch-n7[20523]:         at org.opensearch.transport.InboundHandler.inboundMessage(InboundHandler.java:127) ~[opensearch-2.14.0.jar:2.14.0]
Jun 24 14:55:42 some-opensearch-n7[20523]:         at org.opensearch.transport.TcpTransport.inboundMessage(TcpTransport.java:770) ~[opensearch-2.14.0.jar:2.14.0]
Jun 24 14:55:42 some-opensearch-n7[20523]:         at org.opensearch.transport.InboundPipeline.forwardFragments(InboundPipeline.java:175) ~[opensearch-2.14.0.jar:2.14.0]
Jun 24 14:55:42 some-opensearch-n7[20523]:         at org.opensearch.transport.InboundPipeline.doHandleBytes(InboundPipeline.java:150) ~[opensearch-2.14.0.jar:2.14.0]
Jun 24 14:55:42 some-opensearch-n7[20523]:         at org.opensearch.transport.InboundPipeline.handleBytes(InboundPipeline.java:115) ~[opensearch-2.14.0.jar:2.14.0]
Jun 24 14:55:42 some-opensearch-n7[20523]:         at org.opensearch.transport.netty4.Netty4MessageChannelHandler.channelRead(Netty4MessageChannelHandler.java:95) ~[transport-netty4-client-2.14.0.jar:2.14.0]
Jun 24 14:55:42 some-opensearch-n7[20523]:         at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:442) ~[netty-transport-4.1.109.Final.jar:4.1.109.Final]
Jun 24 14:55:42 some-opensearch-n7[20523]:         at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420) ~[netty-transport-4.1.109.Final.jar:4.1.109.Final]
Jun 24 14:55:42 some-opensearch-n7[20523]:         at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:412) ~[netty-transport-4.1.109.Final.jar:4.1.109.Final]
Jun 24 14:55:42 some-opensearch-n7[20523]:         at io.netty.handler.logging.LoggingHandler.channelRead(LoggingHandler.java:280) ~[netty-handler-4.1.109.Final.jar:4.1.109.Final]
Jun 24 14:55:42 some-opensearch-n7[20523]:         at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:442) ~[netty-transport-4.1.109.Final.jar:4.1.109.Final]
Jun 24 14:55:42 some-opensearch-n7[20523]:         at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420) ~[netty-transport-4.1.109.Final.jar:4.1.109.Final]
Jun 24 14:55:42 some-opensearch-n7[20523]:         at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:412) ~[netty-transport-4.1.109.Final.jar:4.1.109.Final]
Jun 24 14:55:42 some-opensearch-n7[20523]:         at io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:103) ~[netty-codec-4.1.109.Final.jar:4.1.109.Final]
Jun 24 14:55:42 some-opensearch-n7[20523]:         at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:444) ~[netty-transport-4.1.109.Final.jar:4.1.109.Final]
Jun 24 14:55:42 some-opensearch-n7[20523]:         at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420) ~[netty-transport-4.1.109.Final.jar:4.1.109.Final]
Jun 24 14:55:42 some-opensearch-n7[20523]:         at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:412) ~[netty-transport-4.1.109.Final.jar:4.1.109.Final]
Jun 24 14:55:42 some-opensearch-n7[20523]:         at io.netty.handler.ssl.SslHandler.unwrap(SslHandler.java:1475) ~[netty-handler-4.1.109.Final.jar:4.1.109.Final]
Jun 24 14:55:42 some-opensearch-n7[20523]:         at io.netty.handler.ssl.SslHandler.decodeJdkCompatible(SslHandler.java:1338) ~[netty-handler-4.1.109.Final.jar:4.1.109.Final]
Jun 24 14:55:42 some-opensearch-n7[20523]:         at io.netty.handler.ssl.SslHandler.decode(SslHandler.java:1387) ~[netty-handler-4.1.109.Final.jar:4.1.109.Final]
Jun 24 14:55:42 some-opensearch-n7[20523]:         at io.netty.handler.codec.ByteToMessageDecoder.decodeRemovalReentryProtection(ByteToMessageDecoder.java:529) ~[netty-codec-4.1.109.Final.jar:4.1.109.Final]
Jun 24 14:55:42 some-opensearch-n7[20523]:         at io.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:468) ~[netty-codec-4.1.109.Final.jar:4.1.109.Final]
Jun 24 14:55:42 some-opensearch-n7[20523]:         at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:290) ~[netty-codec-4.1.109.Final.jar:4.1.109.Final]
Jun 24 14:55:42 some-opensearch-n7[20523]:         at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:444) ~[netty-transport-4.1.109.Final.jar:4.1.109.Final]
Jun 24 14:55:42 some-opensearch-n7[20523]:         at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420) ~[netty-transport-4.1.109.Final.jar:4.1.109.Final]
Jun 24 14:55:42 some-opensearch-n7[20523]:         at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:412) ~[netty-transport-4.1.109.Final.jar:4.1.109.Final]
Jun 24 14:55:42 some-opensearch-n7[20523]:         at io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1410) ~[netty-transport-4.1.109.Final.jar:4.1.109.Final]
Jun 24 14:55:42 some-opensearch-n7[20523]:         at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:440) ~[netty-transport-4.1.109.Final.jar:4.1.109.Final]
Jun 24 14:55:42 some-opensearch-n7[20523]:         at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420) ~[netty-transport-4.1.109.Final.jar:4.1.109.Final]
Jun 24 14:55:42 some-opensearch-n7[20523]:         at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:919) ~[?:?]
Jun 24 14:55:42 some-opensearch-n7[20523]:         at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:166) ~[?:?]
Jun 24 14:55:42 some-opensearch-n7[20523]:         at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:788) ~[?:?]
Jun 24 14:55:42 some-opensearch-n7[20523]:         at io.netty.channel.nio.NioEventLoop.processSelectedKeysPlain(NioEventLoop.java:689) ~[?:?]
Jun 24 14:55:42 some-opensearch-n7[20523]:         at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:652) ~[?:?]
Jun 24 14:55:42 some-opensearch-n7[20523]:         at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:562) ~[?:?]
Jun 24 14:55:42 some-opensearch-n7[20523]:         at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:997) ~[?:?]
Jun 24 14:55:42 some-opensearch-n7[20523]:         at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74) ~[?:?]
Jun 24 14:55:42 some-opensearch-n7[20523]:         at java.lang.Thread.run(Thread.java:1583) ~[?:?]

Related component

Other

To Reproduce

Not know. The only big difference to my other clusters is, that the failing cluster is doing cross cluster search and therefore has remote targets configured.

Expected behavior

The upgrade from 2.13 to 2.14 should happen without such exceptions.

Additional Details

Plugins

opensearch-alerting
opensearch-anomaly-detection
opensearch-asynchronous-search
opensearch-cross-cluster-replication 
opensearch-custom-codecs
opensearch-flow-framework  
opensearch-geospatial      
opensearch-index-management        
opensearch-job-scheduler        
opensearch-knn                  
opensearch-ml                 
opensearch-neural-search        
opensearch-notifications         
opensearch-notifications-core    
opensearch-observability         
opensearch-reports-scheduler     
opensearch-security           
opensearch-security-analytics      
opensearch-skills                  
opensearch-sql                
prometheus-exporter         
repository-s3        

Host/Environment (please complete the following information):

Additional context Similar exception already existed in the past when updating from 2.10 to 2.11. https://github.com/opensearch-project/OpenSearch/issues/11491

dbwiddis commented 6 days ago

Previous issue linked to https://github.com/opensearch-project/security/issues/3771 which was supposed to have been fixed, but there's at least one other report of this error in 2.14 here: https://github.com/opensearch-project/security/issues/3771#issuecomment-2136941571 CC: @cwperks

dbwiddis commented 6 days ago

Possibly is related to https://github.com/opensearch-project/performance-analyzer/issues/606

Still-open PR to fix: https://github.com/opensearch-project/performance-analyzer/pull/609

reshippie commented 6 days ago

I'm seeing the same issue upgrading from 2.12 to 2.14

cwperks commented 6 days ago

@Jakob3xD It would be helpful for debugging to understand the setup that produced this error. What plugins or features are enabled?

When this error was seen on upgrade from < 2.11 to 2.11, it was because PerformanceAnalyzer had wrapped an instance of a TransportChannel and was not delegating getVersion() implementation to the wrapped channel. I opened a PR in PA to fix the wrapping issue, but also opened a different PR in the security repo that fixed the issue in the security repo and did not require the PA PR to be merged (though it should merge since its the strategic fix)

Backwards compatibility for serialization requires the receiving or transmitting node of a transport request to know the correct version of the target node. The target node being the node that the transport request is being sent to (from the transmitter end) or the node that sent the transport request (on the receiver end)

It would be helpful to know the concrete className for connection here: https://github.com/opensearch-project/security/blob/9caf5cbaa90c2ccbaf8b0bf00ee8bb1fe9326919/src/main/java/org/opensearch/security/transport/SecurityInterceptor.java#L155

It would also be helpful the know the concrete className for channel here: https://github.com/opensearch-project/security/blob/9caf5cbaa90c2ccbaf8b0bf00ee8bb1fe9326919/src/main/java/org/opensearch/security/ssl/transport/SecuritySSLRequestHandler.java#L96

briend commented 6 days ago

We're also seeing this upgrading from 2.12.0 to 2.14.0. We have the PA plugin installed but it is disabled via env variable DISABLE_PERFORMANCE_ANALYZER_AGENT_CLI=true. Note, @Jakob3xD didn't seem to have the opensearch-performance-analyzer plugin but still had this error.

opensearch-alerting
opensearch-anomaly-detection
opensearch-asynchronous-search
opensearch-cross-cluster-replication
opensearch-custom-codecs
opensearch-flow-framework
opensearch-geospatial
opensearch-index-management
opensearch-job-scheduler
opensearch-knn
opensearch-ml
opensearch-neural-search
opensearch-notifications
opensearch-notifications-core
opensearch-observability
opensearch-performance-analyzer
opensearch-reports-scheduler
opensearch-security
opensearch-security-analytics
opensearch-skills
opensearch-sql
repository-s3

PA is disabled at startup:

Enabling OpenSearch Security Plugin
Disabling execution of install_demo_configuration.sh for OpenSearch Security Plugin
Disabling execution of /usr/share/opensearch/bin/opensearch-performance-analyzer/performance-analyzer-agent-cli for OpenSearch Performance Analyzer Plugin

error does reference performanceanalyzer.transport.PerformanceAnalyzerTransportRequestHandler.messageReceived:

{"type": "console", "timestamp": "2024-06-24T23:42:24,719Z", "level": "WARN", "component": "o.o.i.e.Engine", "cluster.name": "some-cluster", "node.name": "node-1", "message": " [some-index-2024-06-24][1] failed engine [primary shard [[some-index-2024-06-24][1], node[asdf], [P], s[STARTED], a[id=fdsa]] was demoted while failing replica shard]", "cluster.uuid": "ffff-A", "node.id": "asdf" , 
"stacktrace": ["org.opensearch.OpenSearchException: java.lang.IllegalArgumentException: -84 is not a valid id",
"at org.opensearch.security.support.Base64CustomHelper.deserializeObject(Base64CustomHelper.java:136) ~[?:?]",
"at org.opensearch.security.support.Base64Helper.deserializeObject(Base64Helper.java:46) ~[?:?]",
"at org.opensearch.security.auditlog.impl.AbstractAuditLog.getUser(AbstractAuditLog.java:775) ~[?:?]",
"at org.opensearch.security.auditlog.impl.AbstractAuditLog.logMissingPrivileges(AbstractAuditLog.java:209) ~[?:?]",
"at org.opensearch.security.auditlog.impl.AuditLogImpl.logMissingPrivileges(AuditLogImpl.java:164) ~[?:?]",
"at org.opensearch.security.auditlog.AuditLogSslExceptionHandler.logError(AuditLogSslExceptionHandler.java:72) ~[?:?]",
"at org.opensearch.security.ssl.transport.SecuritySSLRequestHandler.messageReceived(SecuritySSLRequestHandler.java:170) ~[?:?]",
"at org.opensearch.security.OpenSearchSecurityPlugin$6$1.messageReceived(OpenSearchSecurityPlugin.java:795) ~[?:?]",
"at org.opensearch.indexmanagement.rollup.interceptor.RollupInterceptor$interceptHandler$1.messageReceived(RollupInterceptor.kt:114) ~[?:?]",
"at org.opensearch.performanceanalyzer.transport.PerformanceAnalyzerTransportRequestHandler.messageReceived(PerformanceAnalyzerTransportRequestHandler.java:43) ~[?:?]",
"at org.opensearch.transport.RequestHandlerRegistry.processMessageReceived(RequestHandlerRegistry.java:106) ~[opensearch-2.14.0.jar:2.14.0]",
"at org.opensearch.transport.InboundHandler.handleRequest(InboundHandler.java:271) ~[opensearch-2.14.0.jar:2.14.0]",
"at org.opensearch.transport.InboundHandler.messageReceived(InboundHandler.java:144) ~[opensearch-2.14.0.jar:2.14.0]",
"at org.opensearch.transport.InboundHandler.inboundMessage(InboundHandler.java:127) ~[opensearch-2.14.0.jar:2.14.0]",
"at org.opensearch.transport.TcpTransport.inboundMessage(TcpTransport.java:770) ~[opensearch-2.14.0.jar:2.14.0]",
"at org.opensearch.transport.InboundPipeline.forwardFragments(InboundPipeline.java:175) ~[opensearch-2.14.0.jar:2.14.0]",
"at org.opensearch.transport.InboundPipeline.doHandleBytes(InboundPipeline.java:150) [opensearch-2.14.0.jar:2.14.0]",
"at org.opensearch.transport.InboundPipeline.handleBytes(InboundPipeline.java:115) [opensearch-2.14.0.jar:2.14.0]",
"at org.opensearch.transport.netty4.Netty4MessageChannelHandler.channelRead(Netty4MessageChannelHandler.java:95) [transport-netty4-client-2.14.0.jar:2.14.0]",
"at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:442) [netty-transport-4.1.109.Final.jar:4.1.109.Final]",
"at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420) [netty-transport-4.1.109.Final.jar:4.1.109.Final]",
"at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:412) [netty-transport-4.1.109.Final.jar:4.1.109.Final]",
"at io.netty.handler.logging.LoggingHandler.channelRead(LoggingHandler.java:280) [netty-handler-4.1.109.Final.jar:4.1.109.Final]",
"at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:442) ~[netty-transport-4.1.109.Final.jar:4.1.109.Final]",
"at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420) ~[netty-transport-4.1.109.Final.jar:4.1.109.Final]",
"at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:412) ~[netty-transport-4.1.109.Final.jar:4.1.109.Final]",
"at io.netty.handler.codec.MessageToMessageDecoder.channelRead(MessageToMessageDecoder.java:103) ~[netty-codec-4.1.109.Final.jar:4.1.109.Final]",
"at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:444) [netty-transport-4.1.109.Final.jar:4.1.109.Final]",
"at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420) [netty-transport-4.1.109.Final.jar:4.1.109.Final]",
"at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:412) [netty-transport-4.1.109.Final.jar:4.1.109.Final]",
"at io.netty.handler.ssl.SslHandler.unwrap(SslHandler.java:1475) [netty-handler-4.1.109.Final.jar:4.1.109.Final]",
"at io.netty.handler.ssl.SslHandler.decodeJdkCompatible(SslHandler.java:1338) [netty-handler-4.1.109.Final.jar:4.1.109.Final]",
"at io.netty.handler.ssl.SslHandler.decode(SslHandler.java:1387) [netty-handler-4.1.109.Final.jar:4.1.109.Final]",
"at io.netty.handler.codec.ByteToMessageDecoder.decodeRemovalReentryProtection(ByteToMessageDecoder.java:529) [netty-codec-4.1.109.Final.jar:4.1.109.Final]",
"at io.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:468) [netty-codec-4.1.109.Final.jar:4.1.109.Final]",
"at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:290) [netty-codec-4.1.109.Final.jar:4.1.109.Final]",
"at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:444) [netty-transport-4.1.109.Final.jar:4.1.109.Final]",
"at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420) [netty-transport-4.1.109.Final.jar:4.1.109.Final]",
"at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:412) [netty-transport-4.1.109.Final.jar:4.1.109.Final]",
"at io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1410) [netty-transport-4.1.109.Final.jar:4.1.109.Final]",
"at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:440) [netty-transport-4.1.109.Final.jar:4.1.109.Final]",
"at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420) [netty-transport-4.1.109.Final.jar:4.1.109.Final]",
"at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:919) [netty-transport-4.1.109.Final.jar:4.1.109.Final]",
"at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:166) [netty-transport-4.1.109.Final.jar:4.1.109.Final]",
"at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:788) [netty-transport-4.1.109.Final.jar:4.1.109.Final]",
"at io.netty.channel.nio.NioEventLoop.processSelectedKeysPlain(NioEventLoop.java:689) [netty-transport-4.1.109.Final.jar:4.1.109.Final]",
"at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:652) [netty-transport-4.1.109.Final.jar:4.1.109.Final]",
"at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:562) [netty-transport-4.1.109.Final.jar:4.1.109.Final]",
"at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:997) [netty-common-4.1.109.Final.jar:4.1.109.Final]",
"at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74) [netty-common-4.1.109.Final.jar:4.1.109.Final]",
"at java.lang.Thread.run(Thread.java:1583) [?:?]",
"Caused by: java.lang.IllegalArgumentException: -84 is not a valid id",
"at org.opensearch.security.support.Base64CustomHelper$CustomSerializationFormat.fromId(Base64CustomHelper.java:63) ~[?:?]",
"at org.opensearch.security.support.Base64CustomHelper.deserializeObject(Base64CustomHelper.java:117) ~[?:?]",
"... 50 more"] }
Jakob3xD commented 5 days ago

@Jakob3xD It would be helpful for debugging to understand the setup that produced this error. What plugins or features are enabled?

The plugins are listed in the issue. PA is completely removed during the image build process via /usr/share/opensearch/bin/opensearch-plugin remove opensearch-performance-analyzer. The only difference to my other clusters is the cluster configuration of remote clusters and the configuration of an awareness attribute.

To give more details on how i encountered the bug. I upgraded all my remote clusters from 2.13 to 2.14 without any issues and afterwards I wanted to upgrade the main clusters, where all the remote targets are configured. First two node reboots worked without any issue but after the third node thinks started to fail with the already named exception. I fixed this issue like the last time where the upgrade to 2.11 failed, by doing a full cluster upgrade and not the rolling upgrade. After doing the full cluster upgrade by shutting the cluster down and starting it with 2.14 the exceptions are gone.

It would be helpful to know the concrete className for connection here: https://github.com/opensearch-project/security/blob/9caf5cbaa90c2ccbaf8b0bf00ee8bb1fe9326919/src/main/java/org/opensearch/security/transport/SecurityInterceptor.java#L155

It would also be helpful the know the concrete className for channel here: https://github.com/opensearch-project/security/blob/9caf5cbaa90c2ccbaf8b0bf00ee8bb1fe9326919/src/main/java/org/opensearch/security/ssl/transport/SecuritySSLRequestHandler.java#L96

Not sure how I would get those? I am not familiar with java.

cwperks commented 5 days ago

Based on the stack trace it appears the issue is this:

2.12 node ------- transport request ------> 2.14 node

When the 2.12 node is serializing its opting to use JDK serialization, but the 2.14 node is trying to deserialize as custom.

The 2.12 node should be opting to use custom serialization since the target node is >= 2.11: https://github.com/opensearch-project/security/blob/2.12/src/main/java/org/opensearch/security/transport/SecurityInterceptor.java#L153

cwperks commented 5 days ago

@briend Do you also have cross cluster configured?

@Jakob3xD Is the error seen on the replica cluster?

briend commented 5 days ago

@briend Do you also have cross cluster configured?

No, we don't have cross cluster search configured on the cluster that had the issue. I was also able to recover by doing a full upgrade instead of rolling (shut all nodes down and start up with 2.14.0).

peternied commented 4 days ago

[Triage - attendees 1 2 3] Transferring to security repo

Jakob3xD commented 3 days ago

@Jakob3xD Is the error seen on the replica cluster?

Nope, only happened on the main cluster. And they are not replica clusters. They are just configures as remote cluster to be able to query all logs via one opensearch.

briend commented 3 days ago

I could not reproduce the error with the security plugin disabled, FWIW