redis / lettuce

Advanced Java Redis client for thread-safe sync, async, and reactive usage. Supports Cluster, Sentinel, Pipelining, and codecs.
https://lettuce.io
MIT License
5.35k stars 960 forks source link

Spring boot - Lost of master and first sentinel node it causes READONLY exception #2548

Closed wuase closed 1 month ago

wuase commented 9 months ago

Lettuce doesn't seems able to recognize topology changes if the first configured sentinel node goes down with the master. Even configuring a periodic topology refresh seems to be uneffective.

I know from this discussion that looking for master by just one sentinel is a desired behavior, it exists any workaround or procedure to re-establish connection to the correct master?

Update: It seems that the affected connection are the one already established by the pool, new connections works like a charm.

Redis configuration:

Scenario: Spring boot application is connected to Redis by sentinels through lettuce with pool.

Spring boot configuration:

spring:
  redis:
    database: 0
    host: sentinel-1
    port: 26379
    timeout: 60000
    sentinel:
      master: "mymaster"
      nodes:
        - "sentinel-1:26379"
        - "sentinel-2:26380"
        - "sentinel-3:26381"
...
    @Bean
    public LettuceClientConfiguration clientConfiguration() {

        GenericObjectPoolConfig<Object> pool = new GenericObjectPoolConfig<>();
        pool.setMaxTotal(100);
        pool.setMaxIdle(100);
        pool.setMinIdle(10);
        pool.setBlockWhenExhausted(true);
        pool.setMaxWait(Duration.of(2, ChronoUnit.SECONDS));

        return LettucePoolingClientConfiguration.builder()
                .poolConfig(pool)
                .readFrom(ReadFrom.LOWEST_LATENCY)
                .clientOptions(ClientOptions.builder()
                        .pingBeforeActivateConnection(true)
                        .autoReconnect(true)
                        .socketOptions(SocketOptions.builder()
                                .keepAlive(true)
                                .build())
                        .cancelCommandsOnReconnectFailure(true)
                        .suspendReconnectOnProtocolFailure(false)
                        .disconnectedBehavior(ClientOptions.DisconnectedBehavior.REJECT_COMMANDS)
                        .build())
                .build();
    }
...

Stacktrace:

 org.springframework.data.redis.RedisSystemException: Error in execution; nested exception is io.lettuce.core.RedisReadOnlyException: READONLY You can't write against a read only slave.
    at org.springframework.data.redis.connection.lettuce.LettuceExceptionConverter.convert(LettuceExceptionConverter.java:54) ~[spring-data-redis-2.7.17.jar!/:2.7.17]
    at org.springframework.data.redis.connection.lettuce.LettuceExceptionConverter.convert(LettuceExceptionConverter.java:52) ~[spring-data-redis-2.7.17.jar!/:2.7.17]
    at org.springframework.data.redis.connection.lettuce.LettuceExceptionConverter.convert(LettuceExceptionConverter.java:41) ~[spring-data-redis-2.7.17.jar!/:2.7.17]
    at org.springframework.data.redis.PassThroughExceptionTranslationStrategy.translate(PassThroughExceptionTranslationStrategy.java:44) ~[spring-data-redis-2.7.17.jar!/:2.7.17]
    at org.springframework.data.redis.FallbackExceptionTranslationStrategy.translate(FallbackExceptionTranslationStrategy.java:42) ~[spring-data-redis-2.7.17.jar!/:2.7.17]
    at org.springframework.data.redis.connection.lettuce.LettuceConnection.convertLettuceAccessException(LettuceConnection.java:277) ~[spring-data-redis-2.7.17.jar!/:2.7.17]
    at org.springframework.data.redis.connection.lettuce.LettuceConnection.await(LettuceConnection.java:1085) ~[spring-data-redis-2.7.17.jar!/:2.7.17]
    at org.springframework.data.redis.connection.lettuce.LettuceConnection.lambda$doInvoke$4(LettuceConnection.java:938) ~[spring-data-redis-2.7.17.jar!/:2.7.17]
    at org.springframework.data.redis.connection.lettuce.LettuceInvoker$Synchronizer.invoke(LettuceInvoker.java:665) ~[spring-data-redis-2.7.17.jar!/:2.7.17]
    at org.springframework.data.redis.connection.lettuce.LettuceInvoker.just(LettuceInvoker.java:94) ~[spring-data-redis-2.7.17.jar!/:2.7.17]
    at org.springframework.data.redis.connection.lettuce.LettuceKeyCommands.del(LettuceKeyCommands.java:106) ~[spring-data-redis-2.7.17.jar!/:2.7.17]
    at org.springframework.data.redis.connection.DefaultedRedisConnection.del(DefaultedRedisConnection.java:95) ~[spring-data-redis-2.7.17.jar!/:2.7.17]
    at jdk.internal.reflect.GeneratedMethodAccessor11.invoke(Unknown Source) ~[na:na]
    at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[na:na]
    at java.base/java.lang.reflect.Method.invoke(Method.java:568) ~[na:na]
    at org.springframework.data.redis.core.CloseSuppressingInvocationHandler.invoke(CloseSuppressingInvocationHandler.java:61) ~[spring-data-redis-2.7.17.jar!/:2.7.17]
    at jdk.proxy2/jdk.proxy2.$Proxy120.del(Unknown Source) ~[na:na]
    at org.springframework.data.redis.core.RedisKeyValueAdapter.lambda$put$0(RedisKeyValueAdapter.java:235) ~[spring-data-redis-2.7.17.jar!/:2.7.17]
    at org.springframework.data.redis.core.RedisTemplate.execute(RedisTemplate.java:224) ~[spring-data-redis-2.7.17.jar!/:2.7.17]
    at org.springframework.data.redis.core.RedisTemplate.execute(RedisTemplate.java:191) ~[spring-data-redis-2.7.17.jar!/:2.7.17]
    at org.springframework.data.redis.core.RedisTemplate.execute(RedisTemplate.java:178) ~[spring-data-redis-2.7.17.jar!/:2.7.17]
    at org.springframework.data.redis.core.RedisKeyValueAdapter.put(RedisKeyValueAdapter.java:230) ~[spring-data-redis-2.7.17.jar!/:2.7.17]
    at org.springframework.data.keyvalue.core.KeyValueTemplate.lambda$update$1(KeyValueTemplate.java:221) ~[spring-data-keyvalue-2.7.17.jar!/:2.7.17]
    at org.springframework.data.keyvalue.core.KeyValueTemplate.execute(KeyValueTemplate.java:362) ~[spring-data-keyvalue-2.7.17.jar!/:2.7.17]
    at org.springframework.data.keyvalue.core.KeyValueTemplate.update(KeyValueTemplate.java:221) ~[spring-data-keyvalue-2.7.17.jar!/:2.7.17]
    at org.springframework.data.redis.core.RedisKeyValueTemplate.update(RedisKeyValueTemplate.java:178) ~[spring-data-redis-2.7.17.jar!/:2.7.17]
    at org.springframework.data.keyvalue.repository.support.SimpleKeyValueRepository.save(SimpleKeyValueRepository.java:80) ~[spring-data-keyvalue-2.7.17.jar!/:2.7.17]
    at jdk.internal.reflect.GeneratedMethodAccessor10.invoke(Unknown Source) ~[na:na]
    at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[na:na]
    at java.base/java.lang.reflect.Method.invoke(Method.java:568) ~[na:na]
    at org.springframework.data.repository.core.support.RepositoryMethodInvoker$RepositoryFragmentMethodInvoker.lambda$new$0(RepositoryMethodInvoker.java:289) ~[spring-data-commons-2.7.17.jar!/:2.7.17]
    at org.springframework.data.repository.core.support.RepositoryMethodInvoker.doInvoke(RepositoryMethodInvoker.java:137) ~[spring-data-commons-2.7.17.jar!/:2.7.17]
    at org.springframework.data.repository.core.support.RepositoryMethodInvoker.invoke(RepositoryMethodInvoker.java:121) ~[spring-data-commons-2.7.17.jar!/:2.7.17]
    at org.springframework.data.repository.core.support.RepositoryComposition$RepositoryFragments.invoke(RepositoryComposition.java:530) ~[spring-data-commons-2.7.17.jar!/:2.7.17]
    at org.springframework.data.repository.core.support.RepositoryComposition.invoke(RepositoryComposition.java:286) ~[spring-data-commons-2.7.17.jar!/:2.7.17]
    at org.springframework.data.repository.core.support.RepositoryFactorySupport$ImplementationMethodExecutionInterceptor.invoke(RepositoryFactorySupport.java:640) ~[spring-data-commons-2.7.17.jar!/:2.7.17]
    at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:186) ~[spring-aop-5.3.30.jar!/:5.3.30]
    at org.springframework.data.repository.core.support.QueryExecutorMethodInterceptor.doInvoke(QueryExecutorMethodInterceptor.java:164) ~[spring-data-commons-2.7.17.jar!/:2.7.17]
    at org.springframework.data.repository.core.support.QueryExecutorMethodInterceptor.invoke(QueryExecutorMethodInterceptor.java:139) ~[spring-data-commons-2.7.17.jar!/:2.7.17]
    at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:186) ~[spring-aop-5.3.30.jar!/:5.3.30]
    at org.springframework.aop.interceptor.ExposeInvocationInterceptor.invoke(ExposeInvocationInterceptor.java:97) ~[spring-aop-5.3.30.jar!/:5.3.30]
    at org.springframework.aop.framework.ReflectiveMethodInvocation.proceed(ReflectiveMethodInvocation.java:186) ~[spring-aop-5.3.30.jar!/:5.3.30]
    at org.springframework.aop.framework.JdkDynamicAopProxy.invoke(JdkDynamicAopProxy.java:241) ~[spring-aop-5.3.30.jar!/:5.3.30]
    at jdk.proxy2/jdk.proxy2.$Proxy87.save(Unknown Source) ~[na:na]
    at com.mycompany.redissentineltest.RedisSentinelTestApplication.writeOnRedis(RedisSentinelTestApplication.java:47) ~[classes!/:0.0.1-SNAPSHOT]
    at jdk.internal.reflect.GeneratedMethodAccessor9.invoke(Unknown Source) ~[na:na]
    at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[na:na]
    at java.base/java.lang.reflect.Method.invoke(Method.java:568) ~[na:na]
    at org.springframework.scheduling.support.ScheduledMethodRunnable.run(ScheduledMethodRunnable.java:84) ~[spring-context-5.3.30.jar!/:5.3.30]
    at org.springframework.scheduling.support.DelegatingErrorHandlingRunnable.run(DelegatingErrorHandlingRunnable.java:54) ~[spring-context-5.3.30.jar!/:5.3.30]
    at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:539) ~[na:na]
    at java.base/java.util.concurrent.FutureTask.runAndReset(FutureTask.java:305) ~[na:na]
    at java.base/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:305) ~[na:na]
    at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1136) ~[na:na]
    at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:635) ~[na:na]
    at java.base/java.lang.Thread.run(Thread.java:833) ~[na:na]
 Caused by: io.lettuce.core.RedisReadOnlyException: READONLY You can't write against a read only slave.
    at io.lettuce.core.internal.ExceptionFactory.createExecutionException(ExceptionFactory.java:144) ~[lettuce-core-6.1.10.RELEASE.jar!/:6.1.10.RELEASE]
    at io.lettuce.core.internal.ExceptionFactory.createExecutionException(ExceptionFactory.java:116) ~[lettuce-core-6.1.10.RELEASE.jar!/:6.1.10.RELEASE]
    at io.lettuce.core.protocol.AsyncCommand.completeResult(AsyncCommand.java:120) ~[lettuce-core-6.1.10.RELEASE.jar!/:6.1.10.RELEASE]
    at io.lettuce.core.protocol.AsyncCommand.complete(AsyncCommand.java:111) ~[lettuce-core-6.1.10.RELEASE.jar!/:6.1.10.RELEASE]
    at io.lettuce.core.protocol.CommandWrapper.complete(CommandWrapper.java:63) ~[lettuce-core-6.1.10.RELEASE.jar!/:6.1.10.RELEASE]
    at io.lettuce.core.protocol.CommandHandler.complete(CommandHandler.java:747) ~[lettuce-core-6.1.10.RELEASE.jar!/:6.1.10.RELEASE]
    at io.lettuce.core.protocol.CommandHandler.decode(CommandHandler.java:682) ~[lettuce-core-6.1.10.RELEASE.jar!/:6.1.10.RELEASE]
    at io.lettuce.core.protocol.CommandHandler.channelRead(CommandHandler.java:599) ~[lettuce-core-6.1.10.RELEASE.jar!/:6.1.10.RELEASE]
    at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:442) ~[netty-transport-4.1.100.Final.jar!/:4.1.100.Final]
    at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420) ~[netty-transport-4.1.100.Final.jar!/:4.1.100.Final]
    at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:412) ~[netty-transport-4.1.100.Final.jar!/:4.1.100.Final]
    at io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1410) ~[netty-transport-4.1.100.Final.jar!/:4.1.100.Final]
    at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:440) ~[netty-transport-4.1.100.Final.jar!/:4.1.100.Final]
    at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420) ~[netty-transport-4.1.100.Final.jar!/:4.1.100.Final]
    at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:919) ~[netty-transport-4.1.100.Final.jar!/:4.1.100.Final]
    at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:166) ~[netty-transport-4.1.100.Final.jar!/:4.1.100.Final]
    at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:788) ~[netty-transport-4.1.100.Final.jar!/:4.1.100.Final]
    at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:724) ~[netty-transport-4.1.100.Final.jar!/:4.1.100.Final]
    at io.netty.channel.nio.NioEventLoop.processSelectedKeys(NioEventLoop.java:650) ~[netty-transport-4.1.100.Final.jar!/:4.1.100.Final]
    at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:562) ~[netty-transport-4.1.100.Final.jar!/:4.1.100.Final]
    at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:997) ~[netty-common-4.1.100.Final.jar!/:4.1.100.Final]
    at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74) ~[netty-common-4.1.100.Final.jar!/:4.1.100.Final]
    at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) ~[netty-common-4.1.100.Final.jar!/:4.1.100.Final]
    ... 1 common frames omitted

Thanks, Angelo

wuase commented 9 months ago

The issue related to the pending pool connection seems to be related to L3 tcp timeout problem discussed in this issue, I'll update and eventually close the issue after some other tests.

xingzhang8023 commented 3 weeks ago

@wuase I have the same issuse, is there an analysis conclusion of this issuse

wuase commented 3 weeks ago

@xingzhang8023 Socket hangup were occurring, kernel fine tuning fixed the issue.

Check out these articles https://blog.cloudflare.com/when-tcp-sockets-refuse-to-die https://access.redhat.com/solutions/726753