redis / lettuce

Advanced Java Redis client for thread-safe sync, async, and reactive usage. Supports Cluster, Sentinel, Pipelining, and codecs.
https://lettuce.io
MIT License
5.4k stars 975 forks source link

RedisSessionExpirationPolicy.cleanExpiredSessions failed with RedisCommandTimeoutException #2496

Closed huaxne closed 1 year ago

huaxne commented 1 year ago

I use Spring Session with Redis in my work. By default, SpringSession executes scheduled tasks (RedisSessionExpirationPolicy.cleanExpiredSessions) every 60 seconds, and occasionally RedisCommandTimeoutException occurs. It seems like connection problem, and it continued about 15 minutes. I tried to add cron task to validate connection, and it run well. But the exception still occurs. What should I do to fix this connection problem.

cron

@Scheduled(fixedRate = 5000) public void lettuceConnCheck() { try { lettuceConnectionFactory.validateConnection(); } catch (Exception e) { log.error("lettuceConnCheck failed with exception[{}]. cause:[{}], message:[{}]", e.getClass(), e.getCause(), e.getMessage()); log.warn("lettuceConnCheck failed with exception:", e); } }

lettuce config:

    final ClusterTopologyRefreshOptions clusterTopologyRefreshOptions = ClusterTopologyRefreshOptions.builder()
            .enablePeriodicRefresh(Duration.ofSeconds(5L))
            .enableAllAdaptiveRefreshTriggers()
            .build();

    final ClusterClientOptions clusterClientOptions = ClusterClientOptions.builder()
            .topologyRefreshOptions(clusterTopologyRefreshOptions)
            .validateClusterNodeMembership(false)
            .maxRedirects(cluster.getMaxRedirects())
            .build();

    final Duration commandTimeout = properties.getTimeout();
    final LettuceClientConfiguration clientConfig = LettuceClientConfiguration.builder()
            .commandTimeout(commandTimeout)
            .clientOptions(clusterClientOptions)
            .build();

logs: [tag:t2-sec-switch-web001] 2023-08-26 12:21:05.000 [ERROR] [pool-3-thread-1] [org.springframework.scheduling.support.TaskUtils$LoggingErrorHandler:95] Unexpected error occurred in scheduled task org.springframework.dao.QueryTimeoutException: Redis command timed out; nested exception is io.lettuce.core.RedisCommandTimeoutException: Command timed out after 5 second(s) at org.springframework.data.redis.connection.lettuce.LettuceExceptionConverter.convert(LettuceExceptionConverter.java:70) at org.springframework.data.redis.connection.lettuce.LettuceExceptionConverter.convert(LettuceExceptionConverter.java:41) at org.springframework.data.redis.PassThroughExceptionTranslationStrategy.translate(PassThroughExceptionTranslationStrategy.java:44) at org.springframework.data.redis.FallbackExceptionTranslationStrategy.translate(FallbackExceptionTranslationStrategy.java:42) at org.springframework.data.redis.connection.lettuce.LettuceConnection.convertLettuceAccessException(LettuceConnection.java:277) at org.springframework.data.redis.connection.lettuce.LettuceConnection.await(LettuceConnection.java:1085) at org.springframework.data.redis.connection.lettuce.LettuceConnection.lambda$doInvoke$4(LettuceConnection.java:938) at org.springframework.data.redis.connection.lettuce.LettuceInvoker$Synchronizer.invoke(LettuceInvoker.java:665) at org.springframework.data.redis.connection.lettuce.LettuceInvoker.just(LettuceInvoker.java:94) at org.springframework.data.redis.connection.lettuce.LettuceSetCommands.sMembers(LettuceSetCommands.java:163) at org.springframework.data.redis.connection.DefaultedRedisConnection.sMembers(DefaultedRedisConnection.java:866) at org.springframework.data.redis.core.DefaultSetOperations.lambda$members$11(DefaultSetOperations.java:244) at org.springframework.data.redis.core.RedisTemplate.execute(RedisTemplate.java:224) at org.springframework.data.redis.core.RedisTemplate.execute(RedisTemplate.java:191) at org.springframework.data.redis.core.AbstractOperations.execute(AbstractOperations.java:97) at org.springframework.data.redis.core.DefaultSetOperations.members(DefaultSetOperations.java:244) at org.springframework.data.redis.core.DefaultBoundSetOperations.members(DefaultBoundSetOperations.java:162) at org.springframework.session.data.redis.RedisSessionExpirationPolicy.cleanExpiredSessions(RedisSessionExpirationPolicy.java:132) at org.springframework.session.data.redis.RedisIndexedSessionRepository.cleanupExpiredSessions(RedisIndexedSessionRepository.java:424) at org.springframework.scheduling.support.DelegatingErrorHandlingRunnable.run(DelegatingErrorHandlingRunnable.java:54) at org.springframework.scheduling.concurrent.ReschedulingRunnable.run(ReschedulingRunnable.java:95) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180) at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) Caused by: io.lettuce.core.RedisCommandTimeoutException: Command timed out after 5 second(s) at io.lettuce.core.internal.ExceptionFactory.createTimeoutException(ExceptionFactory.java:59) at io.lettuce.core.internal.Futures.awaitOrCancel(Futures.java:246) at io.lettuce.core.LettuceFutures.awaitOrCancel(LettuceFutures.java:74) at org.springframework.data.redis.connection.lettuce.LettuceConnection.await(LettuceConnection.java:1083) ... 22 common frames omitted ........ ........ [tag:t2-sec-switch-web001] 2023-08-26 12:35:28.244 [INFO] [lettuce-eventExecutorLoop-1-3] [io.lettuce.core.protocol.ConnectionWatchdog:171] Reconnecting, last destination was xx.xx.xx.xx/xx.xxx.xx.xx:8001 [tag:t2-sec-switch-web001] 2023-08-26 12:35:28.247 [INFO] [lettuce-epollEventLoop-6-2] [io.lettuce.core.protocol.ReconnectionHandler:174] Reconnected to xx.xx.xx.xx:xx

Environment

mp911de commented 1 year ago

You need to find out the reason for the timeout. If you encounter a timeout period, make sure that all Redis nodes are up and reachable. Timeouts are a symptom of infrastructure issues.

huaxne commented 1 year ago

Thank you for your support.

What's strange to me is that my scheduled check task executes LettuceConnectionFactory. validateConnection() every 5 seconds; And no log of validate failure was found. This indicates that the current connection is okay.

But why does a timeout error occur when using it in SpringSession, and it lasts for 15 minutes (during which LettuceConnectionFactory. validateConnection() runs normally).

I have checked the source code of validateConnection and found it executes the ping() function. Can't this function guarantee that the current connection is valid?

huaxne commented 1 year ago

You need to find out the reason for the timeout. If you encounter a timeout period, make sure that all Redis nodes are up and reachable. Timeouts are a symptom of infrastructure issues.

I think there will always be occasional issues with the server or network. What I need is to promptly sense that the current connection is not available and quickly create new TCP connections. The log above shows that this problem occurs approximately 1-2 times a day (15 minutes each time). But LettuceConnectionFactory ValidateConnection() did not recognize that this connection is not available.

mp911de commented 1 year ago

Cluster connections cannot be reliably verified unless issuing a PING to every cluster node as the subsequent operation can target any node within the cluster. I think it is reasonable to move that requirement into Spring Data Redis.

huaxne commented 1 year ago

Cluster connections cannot be reliably verified unless issuing a PING to every cluster node as the subsequent operation can target any node within the cluster. I think it is reasonable to move that requirement into Spring Data Redis.

Yes, I think you're right(it is reasonable to move that requirement into Spring Data Redis). I solved this problem through the keepalive setting. 1428 Thank you very much for your help.