We are using an AWS Elasticache Redis cluster in production with 1 master, 2 replicas. At high loads, one of the instance connecting to Redis sees almost all requests to Redis fail.
On checking in New Relic, was able to see the following: Deadlocked thread: lettuce-nioEventLoop-6-30 Deadlocked thread: lettuce-nioEventLoop-6-29
Further logs attached below.
This does not happen in normal setup but only when there is a sudden spike in number of requests and load is high. We have to stop such instances when this issue occurs.
We have had 3 major spikes in last week and have observed this issue on only 1 instance/pod on each occassion (out of around 100-200 instances)
For most of the other instances, some will have a slight failure count but nothing significant.
Not able to reproduce this scenario on dev/load environments
Bug Report
We are using an AWS Elasticache Redis cluster in production with 1 master, 2 replicas. At high loads, one of the instance connecting to Redis sees almost all requests to Redis fail. On checking in New Relic, was able to see the following:
Deadlocked thread: lettuce-nioEventLoop-6-30
Deadlocked thread: lettuce-nioEventLoop-6-29
Further logs attached below. This does not happen in normal setup but only when there is a sudden spike in number of requests and load is high. We have to stop such instances when this issue occurs. We have had 3 major spikes in last week and have observed this issue on only 1 instance/pod on each occassion (out of around 100-200 instances) For most of the other instances, some will have a slight failure count but nothing significant. Not able to reproduce this scenario on dev/load environments
Current Behavior
Stack trace
```java // your stack trace here; …newrelic.agent.Transaction.checkExpireTracedActivities(Transaction.java:2402) com.newrelic.agent.Transaction.checkExpire(Transaction.java:2205) com.newrelic.agent.Transaction.linkTxOnThread(Transaction.java:1462) com.newrelic.agent.TokenImpl.link(TokenImpl.java:86) io.lettuce.core.DefaultConnectionFuture.lambda$null$0(DefaultConnectionFuture.java:258) ….DefaultConnectionFuture$$Lambda$872/1048586462.accept(Unknown Source) …tuce.core.DefaultConnectionFuture.lambda$thenCompose$1(DefaultConnectionFuture.java:253) …e.DefaultConnectionFuture$$Lambda$840/306318434.accept(Unknown Source) io.lettuce.core.AbstractRedisClient.lambda$null$3(AbstractRedisClient.java:341) ….core.AbstractRedisClient$$Lambda$860/915135338.accept(Unknown Source) …tuce.core.PlainChannelInitializer$1.userEventTriggered(PlainChannelInitializer.java:93) … io.netty.channel.AbstractChannelHandlerContext.invokeUserEventTriggered(AbstractChannelHandlerContext.java:329) … io.netty.channel.AbstractChannelHandlerContext.invokeUserEventTriggered(AbstractChannelHandlerContext.java:315) …l.AbstractChannelHandlerContext.fireUserEventTriggered(AbstractChannelHandlerContext.java:307) ….DefaultChannelPipeline$HeadContext.userEventTriggered(DefaultChannelPipeline.java:1452) … io.netty.channel.AbstractChannelHandlerContext.invokeUserEventTriggered(AbstractChannelHandlerContext.java:329) … io.netty.channel.AbstractChannelHandlerContext.invokeUserEventTriggered(AbstractChannelHandlerContext.java:315) ….channel.DefaultChannelPipeline.fireUserEventTriggered(DefaultChannelPipeline.java:959) …ce.core.protocol.CommandHandler.lambda$channelActive$0(CommandHandler.java:273) …core.protocol.CommandHandler$$Lambda$868/253911577.run(Unknown Source) …netty.util.concurrent.PromiseTask$RunnableAdapter.call(PromiseTask.java:38) io.netty.util.concurrent.PromiseTask.run(PromiseTask.java:73) …etty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:163) ….util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:404) io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:446) ….netty.util.concurrent.SingleThreadEventExecutor$5.run(SingleThreadEventExecutor.java:884) io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) ```Input Code
Input Code
```java // your code here; @Bean public LettuceConnectionFactory redisConnectionFactory() { LettuceClientConfiguration clientConfig = LettuceClientConfiguration.builder() .readFrom(ReadFrom.SLAVE_PREFERRED) .build(); // redisHostName is the primary endpoint of the cluster RedisStaticMasterReplicaConfiguration redisStaticMasterReplicaConfiguration = new RedisStaticMasterReplicaConfiguration(redisHostName, redisPort); // redisHostNameReplica is the AWS ReaderEndpoint of the cluster redisStaticMasterReplicaConfiguration.addNode(redisHostNameReplica, redisPort); redisStaticMasterReplicaConfiguration.setPassword(redisPassword); return new LettuceConnectionFactory(redisStaticMasterReplicaConfiguration, clientConfig); } @Bean(name = "redisUserTemplate") public RedisTemplateExpected behavior/code
Environment
Possible Solution
Additional context