redis / lettuce

Advanced Java Redis client for thread-safe sync, async, and reactive usage. Supports Cluster, Sentinel, Pipelining, and codecs.
https://lettuce.io
MIT License
5.3k stars 947 forks source link

Application got io.lettuce.core.RedisCommandTimeoutException irregularly. #2895

Closed xianwei closed 2 weeks ago

xianwei commented 2 weeks ago

Bug Report

My application get io.lettuce.core.RedisCommandTimeoutException almost every day. After Reading Lettuce Reference, After checking code, monitor and s on, I got this:

  1. Redis server has no crashed/network partition;
  2. There is no slow command,;
  3. Timeout is configured 1000ms, all command finished in 10ms;
  4. There is no Pub/Sub listener or a RedisConnectionStateListener;
  5. The AutoFlushCommands is default value;
  6. RedisCodec is default;

Current Behavior

Application got io.lettuce.core.RedisCommandTimeoutException irregularly. Unexpeted reconnecting log is output frequently.

Stack trace ```java org.springframework.dao.QueryTimeoutException: Redis command timed out; nested exception is io.lettuce.core.RedisCommandTimeoutException: Command timed out after 1 second(s) at org.springframework.data.redis.connection.lettuce.LettuceExceptionConverter.convert(LettuceExceptionConverter.java:70) at org.springframework.data.redis.connection.lettuce.LettuceExceptionConverter.convert(LettuceExceptionConverter.java:41) at org.springframework.data.redis.PassThroughExceptionTranslationStrategy.translate(PassThroughExceptionTranslationStrategy.java:44) at org.springframework.data.redis.FallbackExceptionTranslationStrategy.translate(FallbackExceptionTranslationStrategy.java:42) at org.springframework.data.redis.connection.lettuce.LettuceConnection.convertLettuceAccessException(LettuceConnection.java:273) at org.springframework.data.redis.connection.lettuce.LettuceStringCommands.convertLettuceAccessException(LettuceStringCommands.java:799) at org.springframework.data.redis.connection.lettuce.LettuceStringCommands.get(LettuceStringCommands.java:68) at org.springframework.data.redis.connection.DefaultedRedisConnection.get(DefaultedRedisConnection.java:266) at org.springframework.data.redis.core.DefaultValueOperations$1.inRedis(DefaultValueOperations.java:57) at org.springframework.data.redis.core.AbstractOperations$ValueDeserializingRedisCallback.doInRedis(AbstractOperations.java:60) at org.springframework.data.redis.core.RedisTemplate.execute(RedisTemplate.java:228) at org.springframework.data.redis.core.RedisTemplate.execute(RedisTemplate.java:188) at org.springframework.data.redis.core.AbstractOperations.execute(AbstractOperations.java:96) at org.springframework.data.redis.core.DefaultValueOperations.get(DefaultValueOperations.java:53) ```

Lettuce's debug log is configured. It seems like that Lettuce connecting to redis periodically, and then disconnecting immediately; 38912 and 38918 is normal client port, the others is abnormal. The attchment file is the weird logs. log.txt

I inspected jvm, and I got the ConnectionWatchdog objects from memory.

The normal WatchDog object:
@ConnectionWatchdog[
    LOGGING_QUIET_TIME_MS=@Long[5000],
    logger=@NettyBridgeLogger[org.apache.rocketmq.remoting.netty.NettyLogger$NettyBridgeLogger@4d1aeb34],
    reconnectDelay=@ExponentialDelay[io.lettuce.core.resource.ExponentialDelay@5c4db9],
    bootstrap=@Bootstrap[Bootstrap(BootstrapConfig(group: EpollEventLoopGroup, channelFactory: ReflectiveChannelFactory(EpollSocketChannel.class), options: {CONNECT_TIMEOUT_MILLIS=10000, WRITE_BUFFER_HIGH_WATER_MARK=32768, SO_KEEPALIVE=false, WRITE_BUFFER_LOW_WATER_MARK=8192, ALLOCATOR=PooledByteBufAllocator(directByDefault: true), TCP_NODELAY=false}, handler: io.lettuce.core.PlainChannelInitializer@2282d409, resolver: io.netty.resolver.DefaultAddressResolverGroup@281a8832))],
    reconnectWorkers=@DefaultEventExecutorGroup[io.netty.util.concurrent.DefaultEventExecutorGroup@1693ff90],
    reconnectionHandler=@ReconnectionHandler[io.lettuce.core.protocol.ReconnectionHandler@30f97585],
    reconnectionListener=@[io.lettuce.core.protocol.ReconnectionListener$1@576fef22],
    timer=@HashedWheelTimer[io.netty.util.HashedWheelTimer@1fe8f5e8],
    eventBus=@DefaultEventBus[io.lettuce.core.event.DefaultEventBus@1d3c3bb3],
    channel=@EpollSocketChannel[[id: 0x028535a9, L:/192.168.91.154:38912 - R:ddlldb88.redis.rds.aliyuncs.com/192.168.91.231:6379]],
    remoteAddress=@InetSocketAddress[ddlldb88.redis.rds.aliyuncs.com/192.168.91.231:6379],
    lastReconnectionLogging=@Long[-1],
    logPrefix=@String[[channel=0xb41af033, /192.168.91.154:38912 -> ddlldb88.redis.rds.aliyuncs.com/192.168.91.231:6379, last known addr=ddlldb88.redis.rds.aliyuncs.com/192.168.91.231:6379]],
    reconnectSchedulerSync=@AtomicBoolean[false],
    attempts=@Integer[0],
    armed=@Boolean[true],
    listenOnChannelInactive=@Boolean[true],
    reconnectScheduleTimeout=null,
    added=@Boolean[true],
]

The abnormal WatchDog object:
@ConnectionWatchdog[
    LOGGING_QUIET_TIME_MS=@Long[5000],
    logger=@NettyBridgeLogger[org.apache.rocketmq.remoting.netty.NettyLogger$NettyBridgeLogger@4d1aeb34],
    reconnectDelay=@ExponentialDelay[io.lettuce.core.resource.ExponentialDelay@4d1ad908],
    bootstrap=@Bootstrap[Bootstrap(BootstrapConfig(group: EpollEventLoopGroup, channelFactory: ReflectiveChannelFactory(EpollSocketChannel.class), options: {CONNECT_TIMEOUT_MILLIS=10000, WRITE_BUFFER_HIGH_WATER_MARK=32768, SO_KEEPALIVE=false, WRITE_BUFFER_LOW_WATER_MARK=8192, ALLOCATOR=PooledByteBufAllocator(directByDefault: true), TCP_NODELAY=false}, handler: io.lettuce.core.PlainChannelInitializer@777fb104, resolver: io.netty.resolver.DefaultAddressResolverGroup@281a8832))],
    reconnectWorkers=@DefaultEventExecutorGroup[io.netty.util.concurrent.DefaultEventExecutorGroup@1693ff90],
    reconnectionHandler=@ReconnectionHandler[io.lettuce.core.protocol.ReconnectionHandler@4cbe0dca],
    reconnectionListener=@[io.lettuce.core.protocol.ReconnectionListener$1@576fef22],
    timer=@HashedWheelTimer[io.netty.util.HashedWheelTimer@1fe8f5e8],
    eventBus=@DefaultEventBus[io.lettuce.core.event.DefaultEventBus@1d3c3bb3],
    channel=null,
    remoteAddress=@InetSocketAddress[ddlldb88.redis.rds.aliyuncs.com/192.168.91.231:6379],
    lastReconnectionLogging=@Long[-1],
    logPrefix=@String[[channel=0xddb057ad, /192.168.91.154:43758 -> ddlldb88.redis.rds.aliyuncs.com/192.168.91.231:6379, last known addr=ddlldb88.redis.rds.aliyuncs.com/192.168.91.231:6379]],
    reconnectSchedulerSync=@AtomicBoolean[false],
    attempts=@Integer[0],
    armed=@Boolean[true],
    listenOnChannelInactive=@Boolean[false],
    reconnectScheduleTimeout=null,
    added=@Boolean[true],
]

Environment

xianwei commented 2 weeks ago

image Pipelines are the key to the reconnection, not a bug.

amlyczz commented 2 weeks ago

Has it been fixed?

xianwei commented 1 week ago

Has it been fixed?

I changed the timeout to 2 seconds and it no longer had the problem, it looks like a network IO issue, and opening the connection pool may also alleviate the problem