redis / lettuce

Advanced Java Redis client for thread-safe sync, async, and reactive usage. Supports Cluster, Sentinel, Pipelining, and codecs.
https://lettuce.io
MIT License
5.36k stars 959 forks source link

Cannot failover when timeout error occurs for sentinel node #1245

Open be-hase opened 4 years ago

be-hase commented 4 years ago

Bug Report

Current Behavior & Input Code

My product uses sentinel's master node discovery.
https://github.com/lettuce-io/lettuce-core/wiki/Redis-Sentinel#sentinel.redis-discovery-using-redis-sentinel

The other day, the redis sentinel node(VM) became unable to return a response due to a hypervisor failure, and a timeout error began to occur. I expected lettuce to do the failover. However, failover had failed. Finally, when the VM of the redis sentinel node shut down completely, failover succeeded.
(On the redis and redis sentinel logs, the failover was successful immediately.)

I did various tests. When the sentinel node is completely down and the connection is broken, the failover succeeds. However, if the timeout error occurs without the sentinel node going down completely, failover does not seem to be possible.

Take application start as an example. In the case of such a spring boot app, lettuce skips the first unreachable sentinel node and starts the application properly.

package com.example.demo;

import java.time.Duration;

import org.springframework.boot.SpringApplication;
import org.springframework.boot.autoconfigure.SpringBootApplication;
import org.springframework.context.annotation.Bean;
import org.springframework.context.annotation.Configuration;
import org.springframework.web.bind.annotation.GetMapping;
import org.springframework.web.bind.annotation.RestController;

import io.lettuce.core.RedisClient;
import io.lettuce.core.RedisURI;
import io.lettuce.core.api.StatefulRedisConnection;
import io.lettuce.core.api.sync.RedisCommands;

@SpringBootApplication
public class DemoApplication {

    public static void main(String[] args) {
        SpringApplication.run(DemoApplication.class, args);
    }

    @Configuration
    public static class RedisConfiguration {
        @Bean(destroyMethod = "shutdown")
        public RedisClient redisClient() {
            final var builder = RedisURI.builder()
                                        .withTimeout(Duration.ofSeconds(5))
                                        .withSentinelMasterId("xxx")
                                        .withSentinel("cannot connectable node")
                                        .withSentinel("connectable node")
                                        .withSentinel("connectable node")
                                        .withPassword("xxx");

            final var redisClient = RedisClient.create(builder.build());
            // redisClient.setOptions(ClientOptions.builder().pingBeforeActivateConnection(true).build());
            return redisClient;
        }

        @Bean(destroyMethod = "close")
        public StatefulRedisConnection<String, String> redisConnection(final RedisClient redisClient) {
            return redisClient.connect();
        }

        @Bean
        public RedisCommands<String, String> stringStringRedisCommands(
                final StatefulRedisConnection<String, String> redisConnection
        ) {
            return redisConnection.sync();
        }
    }

    @RestController
    public static class SampleController {
        private final RedisCommands<String, String> redisCommands;

        public SampleController(RedisCommands<String, String> redisCommands) {
            this.redisCommands = redisCommands;
        }

        @GetMapping("/info")
        public Object info() {
            return redisCommands.info();
        }
    }
}
2020-03-27 00:56:56.085  INFO 63174 --- [           main] com.example.demo.DemoApplication         : Starting DemoApplication on haseberyousukenoMacBook-Pro.local with PID 63174 (/Users/hasebe/Desktop/demo/build/classes/java/main started by hasebe in /Users/hasebe/Desktop/demo)
2020-03-27 00:56:56.087  INFO 63174 --- [           main] com.example.demo.DemoApplication         : No active profile set, falling back to default profiles: default
2020-03-27 00:56:56.635  INFO 63174 --- [           main] o.s.b.w.embedded.tomcat.TomcatWebServer  : Tomcat initialized with port(s): 8080 (http)
2020-03-27 00:56:56.643  INFO 63174 --- [           main] o.apache.catalina.core.StandardService   : Starting service [Tomcat]
2020-03-27 00:56:56.643  INFO 63174 --- [           main] org.apache.catalina.core.StandardEngine  : Starting Servlet engine: [Apache Tomcat/9.0.33]
2020-03-27 00:56:56.704  INFO 63174 --- [           main] o.a.c.c.C.[Tomcat].[localhost].[/]       : Initializing Spring embedded WebApplicationContext
2020-03-27 00:56:56.704  INFO 63174 --- [           main] o.s.web.context.ContextLoader            : Root WebApplicationContext: initialization completed in 586 ms
2020-03-27 00:56:56.896  INFO 63174 --- [           main] io.lettuce.core.EpollProvider            : Starting without optional epoll library
2020-03-27 00:56:56.897  INFO 63174 --- [           main] io.lettuce.core.KqueueProvider           : Starting without optional kqueue library
--> THIS 2020-03-27 00:56:56.972  WARN 63174 --- [ioEventLoop-4-1] io.lettuce.core.RedisClient              : Cannot connect Redis Sentinel at RedisURI [host='cannot connectable node', port=26379]: java.util.concurrent.CompletionException: io.netty.channel.AbstractChannel$AnnotatedConnectException: Connection refused: cannot connectable node/cannot connectable node:26379
2020-03-27 00:56:57.129  INFO 63174 --- [           main] o.s.s.concurrent.ThreadPoolTaskExecutor  : Initializing ExecutorService 'applicationTaskExecutor'
2020-03-27 00:56:57.225  INFO 63174 --- [           main] o.s.b.w.embedded.tomcat.TomcatWebServer  : Tomcat started on port(s): 8080 (http) with context path ''
2020-03-27 00:56:57.228  INFO 63174 --- [           main] com.example.demo.DemoApplication         : Started DemoApplication in 1.338 seconds (JVM running for 2.028)

However, when a timeout error occurs, an error occurs and it cannot be started. Why not transfer the processing to the next sentinel node?
(Timeout error is generated by using toxiproxy.)

2020-03-27 01:01:32.439  INFO 63188 --- [           main] com.example.demo.DemoApplication         : Starting DemoApplication on haseberyousukenoMacBook-Pro.local with PID 63188 (/Users/hasebe/Desktop/demo/build/classes/java/main started by hasebe in /Users/hasebe/Desktop/demo)
2020-03-27 01:01:32.443  INFO 63188 --- [           main] com.example.demo.DemoApplication         : No active profile set, falling back to default profiles: default
2020-03-27 01:01:32.918  INFO 63188 --- [           main] o.s.b.w.embedded.tomcat.TomcatWebServer  : Tomcat initialized with port(s): 8080 (http)
2020-03-27 01:01:32.925  INFO 63188 --- [           main] o.apache.catalina.core.StandardService   : Starting service [Tomcat]
2020-03-27 01:01:32.925  INFO 63188 --- [           main] org.apache.catalina.core.StandardEngine  : Starting Servlet engine: [Apache Tomcat/9.0.33]
2020-03-27 01:01:32.967  INFO 63188 --- [           main] o.a.c.c.C.[Tomcat].[localhost].[/]       : Initializing Spring embedded WebApplicationContext
2020-03-27 01:01:32.967  INFO 63188 --- [           main] o.s.web.context.ContextLoader            : Root WebApplicationContext: initialization completed in 501 ms
2020-03-27 01:01:33.118  INFO 63188 --- [           main] io.lettuce.core.EpollProvider            : Starting without optional epoll library
2020-03-27 01:01:33.118  INFO 63188 --- [           main] io.lettuce.core.KqueueProvider           : Starting without optional kqueue library
2020-03-27 01:01:38.206  WARN 63188 --- [           main] ConfigServletWebServerApplicationContext : Exception encountered during context initialization - cancelling refresh attempt: org.springframework.beans.factory.UnsatisfiedDependencyException: Error creating bean with name 'demoApplication.SampleController' defined in file [/Users/hasebe/Desktop/demo/build/classes/java/main/com/example/demo/DemoApplication$SampleController.class]: Unsatisfied dependency expressed through constructor parameter 0; nested exception is org.springframework.beans.factory.UnsatisfiedDependencyException: Error creating bean with name 'stringStringRedisCommands' defined in class path resource [com/example/demo/DemoApplication$RedisConfiguration.class]: Unsatisfied dependency expressed through method 'stringStringRedisCommands' parameter 0; nested exception is org.springframework.beans.factory.BeanCreationException: Error creating bean with name 'redisConnection' defined in class path resource [com/example/demo/DemoApplication$RedisConfiguration.class]: Bean instantiation via factory method failed; nested exception is org.springframework.beans.BeanInstantiationException: Failed to instantiate [io.lettuce.core.api.StatefulRedisConnection]: Factory method 'redisConnection' threw exception; nested exception is io.lettuce.core.RedisConnectionException
2020-03-27 01:01:38.218  INFO 63188 --- [           main] o.apache.catalina.core.StandardService   : Stopping service [Tomcat]
2020-03-27 01:01:38.228  INFO 63188 --- [           main] ConditionEvaluationReportLoggingListener : 

Error starting ApplicationContext. To display the conditions report re-run your application with 'debug' enabled.
2020-03-27 01:01:38.236 ERROR 63188 --- [           main] o.s.boot.SpringApplication               : Application run failed

org.springframework.beans.factory.UnsatisfiedDependencyException: Error creating bean with name 'demoApplication.SampleController' defined in file [/Users/hasebe/Desktop/demo/build/classes/java/main/com/example/demo/DemoApplication$SampleController.class]: Unsatisfied dependency expressed through constructor parameter 0; nested exception is org.springframework.beans.factory.UnsatisfiedDependencyException: Error creating bean with name 'stringStringRedisCommands' defined in class path resource [com/example/demo/DemoApplication$RedisConfiguration.class]: Unsatisfied dependency expressed through method 'stringStringRedisCommands' parameter 0; nested exception is org.springframework.beans.factory.BeanCreationException: Error creating bean with name 'redisConnection' defined in class path resource [com/example/demo/DemoApplication$RedisConfiguration.class]: Bean instantiation via factory method failed; nested exception is org.springframework.beans.BeanInstantiationException: Failed to instantiate [io.lettuce.core.api.StatefulRedisConnection]: Factory method 'redisConnection' threw exception; nested exception is io.lettuce.core.RedisConnectionException
    at org.springframework.beans.factory.support.ConstructorResolver.createArgumentArray(ConstructorResolver.java:798) ~[spring-beans-5.2.5.RELEASE.jar:5.2.5.RELEASE]
    at org.springframework.beans.factory.support.ConstructorResolver.autowireConstructor(ConstructorResolver.java:228) ~[spring-beans-5.2.5.RELEASE.jar:5.2.5.RELEASE]
    at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.autowireConstructor(AbstractAutowireCapableBeanFactory.java:1358) ~[spring-beans-5.2.5.RELEASE.jar:5.2.5.RELEASE]
    at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.createBeanInstance(AbstractAutowireCapableBeanFactory.java:1204) ~[spring-beans-5.2.5.RELEASE.jar:5.2.5.RELEASE]
    at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.doCreateBean(AbstractAutowireCapableBeanFactory.java:557) ~[spring-beans-5.2.5.RELEASE.jar:5.2.5.RELEASE]
    at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.createBean(AbstractAutowireCapableBeanFactory.java:517) ~[spring-beans-5.2.5.RELEASE.jar:5.2.5.RELEASE]
    at org.springframework.beans.factory.support.AbstractBeanFactory.lambda$doGetBean$0(AbstractBeanFactory.java:323) ~[spring-beans-5.2.5.RELEASE.jar:5.2.5.RELEASE]
    at org.springframework.beans.factory.support.DefaultSingletonBeanRegistry.getSingleton(DefaultSingletonBeanRegistry.java:222) ~[spring-beans-5.2.5.RELEASE.jar:5.2.5.RELEASE]
    at org.springframework.beans.factory.support.AbstractBeanFactory.doGetBean(AbstractBeanFactory.java:321) ~[spring-beans-5.2.5.RELEASE.jar:5.2.5.RELEASE]
    at org.springframework.beans.factory.support.AbstractBeanFactory.getBean(AbstractBeanFactory.java:202) ~[spring-beans-5.2.5.RELEASE.jar:5.2.5.RELEASE]
    at org.springframework.beans.factory.support.DefaultListableBeanFactory.preInstantiateSingletons(DefaultListableBeanFactory.java:882) ~[spring-beans-5.2.5.RELEASE.jar:5.2.5.RELEASE]
    at org.springframework.context.support.AbstractApplicationContext.finishBeanFactoryInitialization(AbstractApplicationContext.java:878) ~[spring-context-5.2.5.RELEASE.jar:5.2.5.RELEASE]
    at org.springframework.context.support.AbstractApplicationContext.refresh(AbstractApplicationContext.java:550) ~[spring-context-5.2.5.RELEASE.jar:5.2.5.RELEASE]
    at org.springframework.boot.web.servlet.context.ServletWebServerApplicationContext.refresh(ServletWebServerApplicationContext.java:141) ~[spring-boot-2.2.6.RELEASE.jar:2.2.6.RELEASE]
    at org.springframework.boot.SpringApplication.refresh(SpringApplication.java:747) ~[spring-boot-2.2.6.RELEASE.jar:2.2.6.RELEASE]
    at org.springframework.boot.SpringApplication.refreshContext(SpringApplication.java:397) ~[spring-boot-2.2.6.RELEASE.jar:2.2.6.RELEASE]
    at org.springframework.boot.SpringApplication.run(SpringApplication.java:315) ~[spring-boot-2.2.6.RELEASE.jar:2.2.6.RELEASE]
    at org.springframework.boot.SpringApplication.run(SpringApplication.java:1226) ~[spring-boot-2.2.6.RELEASE.jar:2.2.6.RELEASE]
    at org.springframework.boot.SpringApplication.run(SpringApplication.java:1215) ~[spring-boot-2.2.6.RELEASE.jar:2.2.6.RELEASE]
    at com.example.demo.DemoApplication.main(DemoApplication.java:21) ~[main/:na]
Caused by: org.springframework.beans.factory.UnsatisfiedDependencyException: Error creating bean with name 'stringStringRedisCommands' defined in class path resource [com/example/demo/DemoApplication$RedisConfiguration.class]: Unsatisfied dependency expressed through method 'stringStringRedisCommands' parameter 0; nested exception is org.springframework.beans.factory.BeanCreationException: Error creating bean with name 'redisConnection' defined in class path resource [com/example/demo/DemoApplication$RedisConfiguration.class]: Bean instantiation via factory method failed; nested exception is org.springframework.beans.BeanInstantiationException: Failed to instantiate [io.lettuce.core.api.StatefulRedisConnection]: Factory method 'redisConnection' threw exception; nested exception is io.lettuce.core.RedisConnectionException
    at org.springframework.beans.factory.support.ConstructorResolver.createArgumentArray(ConstructorResolver.java:798) ~[spring-beans-5.2.5.RELEASE.jar:5.2.5.RELEASE]
    at org.springframework.beans.factory.support.ConstructorResolver.instantiateUsingFactoryMethod(ConstructorResolver.java:539) ~[spring-beans-5.2.5.RELEASE.jar:5.2.5.RELEASE]
    at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.instantiateUsingFactoryMethod(AbstractAutowireCapableBeanFactory.java:1338) ~[spring-beans-5.2.5.RELEASE.jar:5.2.5.RELEASE]
    at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.createBeanInstance(AbstractAutowireCapableBeanFactory.java:1177) ~[spring-beans-5.2.5.RELEASE.jar:5.2.5.RELEASE]
    at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.doCreateBean(AbstractAutowireCapableBeanFactory.java:557) ~[spring-beans-5.2.5.RELEASE.jar:5.2.5.RELEASE]
    at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.createBean(AbstractAutowireCapableBeanFactory.java:517) ~[spring-beans-5.2.5.RELEASE.jar:5.2.5.RELEASE]
    at org.springframework.beans.factory.support.AbstractBeanFactory.lambda$doGetBean$0(AbstractBeanFactory.java:323) ~[spring-beans-5.2.5.RELEASE.jar:5.2.5.RELEASE]
    at org.springframework.beans.factory.support.DefaultSingletonBeanRegistry.getSingleton(DefaultSingletonBeanRegistry.java:222) ~[spring-beans-5.2.5.RELEASE.jar:5.2.5.RELEASE]
    at org.springframework.beans.factory.support.AbstractBeanFactory.doGetBean(AbstractBeanFactory.java:321) ~[spring-beans-5.2.5.RELEASE.jar:5.2.5.RELEASE]
    at org.springframework.beans.factory.support.AbstractBeanFactory.getBean(AbstractBeanFactory.java:202) ~[spring-beans-5.2.5.RELEASE.jar:5.2.5.RELEASE]
    at org.springframework.beans.factory.config.DependencyDescriptor.resolveCandidate(DependencyDescriptor.java:276) ~[spring-beans-5.2.5.RELEASE.jar:5.2.5.RELEASE]
    at org.springframework.beans.factory.support.DefaultListableBeanFactory.doResolveDependency(DefaultListableBeanFactory.java:1290) ~[spring-beans-5.2.5.RELEASE.jar:5.2.5.RELEASE]
    at org.springframework.beans.factory.support.DefaultListableBeanFactory.resolveDependency(DefaultListableBeanFactory.java:1210) ~[spring-beans-5.2.5.RELEASE.jar:5.2.5.RELEASE]
    at org.springframework.beans.factory.support.ConstructorResolver.resolveAutowiredArgument(ConstructorResolver.java:885) ~[spring-beans-5.2.5.RELEASE.jar:5.2.5.RELEASE]
    at org.springframework.beans.factory.support.ConstructorResolver.createArgumentArray(ConstructorResolver.java:789) ~[spring-beans-5.2.5.RELEASE.jar:5.2.5.RELEASE]
    ... 19 common frames omitted
Caused by: org.springframework.beans.factory.BeanCreationException: Error creating bean with name 'redisConnection' defined in class path resource [com/example/demo/DemoApplication$RedisConfiguration.class]: Bean instantiation via factory method failed; nested exception is org.springframework.beans.BeanInstantiationException: Failed to instantiate [io.lettuce.core.api.StatefulRedisConnection]: Factory method 'redisConnection' threw exception; nested exception is io.lettuce.core.RedisConnectionException
    at org.springframework.beans.factory.support.ConstructorResolver.instantiate(ConstructorResolver.java:656) ~[spring-beans-5.2.5.RELEASE.jar:5.2.5.RELEASE]
    at org.springframework.beans.factory.support.ConstructorResolver.instantiateUsingFactoryMethod(ConstructorResolver.java:636) ~[spring-beans-5.2.5.RELEASE.jar:5.2.5.RELEASE]
    at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.instantiateUsingFactoryMethod(AbstractAutowireCapableBeanFactory.java:1338) ~[spring-beans-5.2.5.RELEASE.jar:5.2.5.RELEASE]
    at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.createBeanInstance(AbstractAutowireCapableBeanFactory.java:1177) ~[spring-beans-5.2.5.RELEASE.jar:5.2.5.RELEASE]
    at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.doCreateBean(AbstractAutowireCapableBeanFactory.java:557) ~[spring-beans-5.2.5.RELEASE.jar:5.2.5.RELEASE]
    at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.createBean(AbstractAutowireCapableBeanFactory.java:517) ~[spring-beans-5.2.5.RELEASE.jar:5.2.5.RELEASE]
    at org.springframework.beans.factory.support.AbstractBeanFactory.lambda$doGetBean$0(AbstractBeanFactory.java:323) ~[spring-beans-5.2.5.RELEASE.jar:5.2.5.RELEASE]
    at org.springframework.beans.factory.support.DefaultSingletonBeanRegistry.getSingleton(DefaultSingletonBeanRegistry.java:222) ~[spring-beans-5.2.5.RELEASE.jar:5.2.5.RELEASE]
    at org.springframework.beans.factory.support.AbstractBeanFactory.doGetBean(AbstractBeanFactory.java:321) ~[spring-beans-5.2.5.RELEASE.jar:5.2.5.RELEASE]
    at org.springframework.beans.factory.support.AbstractBeanFactory.getBean(AbstractBeanFactory.java:202) ~[spring-beans-5.2.5.RELEASE.jar:5.2.5.RELEASE]
    at org.springframework.beans.factory.config.DependencyDescriptor.resolveCandidate(DependencyDescriptor.java:276) ~[spring-beans-5.2.5.RELEASE.jar:5.2.5.RELEASE]
    at org.springframework.beans.factory.support.DefaultListableBeanFactory.doResolveDependency(DefaultListableBeanFactory.java:1290) ~[spring-beans-5.2.5.RELEASE.jar:5.2.5.RELEASE]
    at org.springframework.beans.factory.support.DefaultListableBeanFactory.resolveDependency(DefaultListableBeanFactory.java:1210) ~[spring-beans-5.2.5.RELEASE.jar:5.2.5.RELEASE]
    at org.springframework.beans.factory.support.ConstructorResolver.resolveAutowiredArgument(ConstructorResolver.java:885) ~[spring-beans-5.2.5.RELEASE.jar:5.2.5.RELEASE]
    at org.springframework.beans.factory.support.ConstructorResolver.createArgumentArray(ConstructorResolver.java:789) ~[spring-beans-5.2.5.RELEASE.jar:5.2.5.RELEASE]
    ... 33 common frames omitted
Caused by: org.springframework.beans.BeanInstantiationException: Failed to instantiate [io.lettuce.core.api.StatefulRedisConnection]: Factory method 'redisConnection' threw exception; nested exception is io.lettuce.core.RedisConnectionException
    at org.springframework.beans.factory.support.SimpleInstantiationStrategy.instantiate(SimpleInstantiationStrategy.java:185) ~[spring-beans-5.2.5.RELEASE.jar:5.2.5.RELEASE]
    at org.springframework.beans.factory.support.ConstructorResolver.instantiate(ConstructorResolver.java:651) ~[spring-beans-5.2.5.RELEASE.jar:5.2.5.RELEASE]
    ... 47 common frames omitted
Caused by: io.lettuce.core.RedisConnectionException: null
    at io.lettuce.core.RedisConnectionException.create(RedisConnectionException.java:75) ~[lettuce-core-5.2.2.RELEASE.jar:5.2.2.RELEASE]
    at io.lettuce.core.RedisConnectionException.create(RedisConnectionException.java:56) ~[lettuce-core-5.2.2.RELEASE.jar:5.2.2.RELEASE]
    at io.lettuce.core.AbstractRedisClient.getConnection(AbstractRedisClient.java:234) ~[lettuce-core-5.2.2.RELEASE.jar:5.2.2.RELEASE]
    at io.lettuce.core.RedisClient.connect(RedisClient.java:207) ~[lettuce-core-5.2.2.RELEASE.jar:5.2.2.RELEASE]
    at io.lettuce.core.RedisClient.connect(RedisClient.java:192) ~[lettuce-core-5.2.2.RELEASE.jar:5.2.2.RELEASE]
    at com.example.demo.DemoApplication$RedisConfiguration.redisConnection(DemoApplication.java:43) ~[main/:na]
    at com.example.demo.DemoApplication$RedisConfiguration$$EnhancerBySpringCGLIB$$26deeda6.CGLIB$redisConnection$1(<generated>) ~[main/:na]
    at com.example.demo.DemoApplication$RedisConfiguration$$EnhancerBySpringCGLIB$$26deeda6$$FastClassBySpringCGLIB$$b9d69ec6.invoke(<generated>) ~[main/:na]
    at org.springframework.cglib.proxy.MethodProxy.invokeSuper(MethodProxy.java:244) ~[spring-core-5.2.5.RELEASE.jar:5.2.5.RELEASE]
    at org.springframework.context.annotation.ConfigurationClassEnhancer$BeanMethodInterceptor.intercept(ConfigurationClassEnhancer.java:331) ~[spring-context-5.2.5.RELEASE.jar:5.2.5.RELEASE]
    at com.example.demo.DemoApplication$RedisConfiguration$$EnhancerBySpringCGLIB$$26deeda6.redisConnection(<generated>) ~[main/:na]
    at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[na:na]
    at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) ~[na:na]
    at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[na:na]
    at java.base/java.lang.reflect.Method.invoke(Method.java:566) ~[na:na]
    at org.springframework.beans.factory.support.SimpleInstantiationStrategy.instantiate(SimpleInstantiationStrategy.java:154) ~[spring-beans-5.2.5.RELEASE.jar:5.2.5.RELEASE]
    ... 48 common frames omitted
Caused by: java.util.concurrent.TimeoutException: Did not observe any item or terminal signal within 5000ms in 'map' (and no fallback has been configured)
    at reactor.core.publisher.FluxTimeout$TimeoutMainSubscriber.handleTimeout(FluxTimeout.java:289) ~[reactor-core-3.3.4.RELEASE.jar:3.3.4.RELEASE]
    at reactor.core.publisher.FluxTimeout$TimeoutMainSubscriber.doTimeout(FluxTimeout.java:274) ~[reactor-core-3.3.4.RELEASE.jar:3.3.4.RELEASE]
    at reactor.core.publisher.FluxTimeout$TimeoutTimeoutSubscriber.onNext(FluxTimeout.java:396) ~[reactor-core-3.3.4.RELEASE.jar:3.3.4.RELEASE]
    at reactor.core.publisher.StrictSubscriber.onNext(StrictSubscriber.java:89) ~[reactor-core-3.3.4.RELEASE.jar:3.3.4.RELEASE]
    at reactor.core.publisher.FluxOnErrorResume$ResumeSubscriber.onNext(FluxOnErrorResume.java:73) ~[reactor-core-3.3.4.RELEASE.jar:3.3.4.RELEASE]
    at reactor.core.publisher.MonoDelay$MonoDelayRunnable.run(MonoDelay.java:117) ~[reactor-core-3.3.4.RELEASE.jar:3.3.4.RELEASE]
    at reactor.core.scheduler.SchedulerTask.call(SchedulerTask.java:68) ~[reactor-core-3.3.4.RELEASE.jar:3.3.4.RELEASE]
    at reactor.core.scheduler.SchedulerTask.call(SchedulerTask.java:28) ~[reactor-core-3.3.4.RELEASE.jar:3.3.4.RELEASE]
    at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264) ~[na:na]
    at java.base/java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:304) ~[na:na]
    at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) ~[na:na]
    at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) ~[na:na]
    at java.base/java.lang.Thread.run(Thread.java:834) ~[na:na]

I also tried it after the application started. Use toxiproxy so that a timeout error occurs on the sentinel node to which lettuce is connected. At this time, bring down the master node. Then, failover fails and the application tries to connect to the master node that has gone down forever.

On the other hand, when the sendinel node goes down and a connect error occurs, it seems that the master node is found from the following sentinel node and the failover succeeds.

Expected behavior/code

If a timeout error occurs for the sentinel node, use the following sentinel node. (There is no problem with connect error.)

However, it still seems to work around if we set pingBeforeActivateConnection to true. I worked around by enabling this setting.

Environment

Possible Solution

Additional context

mp911de commented 4 years ago

Thanks for report. We need to fix the timeout issue when a sentinel connection fails.

mp911de commented 4 years ago

After looking in this issue, the problem arises from the fact that the connect was successful but Sentinel failed to respond within the timeout. The client code assumes that when the connection was established, Sentinel is functional. At the time we query sentinel we no longer have access to the connection progress (i.e. which hosts were tried, which failed and so on) as we operate on an existing connection.

The entire mechanism is asynchronous and therefore it imposes a certain complexity to fix the issue properly. For now, please enable PING on connect via ClientOptions (ClientOptions.builder().pingBeforeActivateConnection(true).build()). What this does is issuing a PING command during the connect phase to ensure that Redis responds properly. We get the guarantee that at least at the time the connection gets created the Sentinel is alive. Unhealthy/unresponsive nodes are skipped and we increase the chance of hitting a sentinel node that is able to properly reply with the master address.

be-hase commented 4 years ago

Thank you for the detailed investigation. Keep pingBeforeActivateConnection enabled until it is fixed. (It seems to be difficult to fix because it is an asynchronous mechanism...)