When performing Jedis operations in the production environment, the system experiences lags lasting several minutes. After troubleshooting with jstack, we found that numerous threads enter a WAITING state after calling getSlotConnection(). Upon examining the source code for JedisClusterInfoCache, I noticed that this class uses a ReentrantReadWriteLock, leading me to suspect that a write lock is being held, which is causing prolonged read lock blocking. Based on this, I developed a tool to proactively acquire the write lock, as outlined below:
package com.nacol.redisbandwidth.component.cache;
import redis.clients.jedis.*;
import java.lang.reflect.Field;
import java.util.concurrent.locks.Lock;
public class JedisClusterInfoCacheLockUtil {
private final Lock writeLock;
private final Lock readLock;
public JedisClusterInfoCacheLockUtil(JedisCluster jedisCluster) throws Exception {
Field connectionHandlerField = BinaryJedisCluster.class.getDeclaredField("connectionHandler");
connectionHandlerField.setAccessible(true);
JedisClusterConnectionHandler connectionHandler = (JedisClusterConnectionHandler) connectionHandlerField.get(jedisCluster);
Field cacheField = JedisClusterConnectionHandler.class.getDeclaredField("cache");
cacheField.setAccessible(true);
JedisClusterInfoCache jedisClusterInfoCache = (JedisClusterInfoCache) cacheField.get(connectionHandler);
Field writeLockField = JedisClusterInfoCache.class.getDeclaredField("w");
writeLockField.setAccessible(true);
this.writeLock = (Lock) writeLockField.get(jedisClusterInfoCache);
Field readLockField = JedisClusterInfoCache.class.getDeclaredField("r");
readLockField.setAccessible(true);
this.readLock = (Lock) readLockField.get(jedisClusterInfoCache);
}
public void lockWrite() {
writeLock.lock();
}
public void unlockWrite() {
writeLock.unlock();
}
public void lockRead() {
readLock.lock();
}
public void unlockRead() {
readLock.unlock();
}
}
Then, execute the following demo:
1. Initialize JedisCluster.
2. Acquire the write lock.
3. Start a child thread to execute the get command (executing get in the same thread would re-enter, which doesn’t fit the scenario).
// STEP init Clsuter
JedisCluster cluster = JedisClient.getCluster();
// STEP init clock util
JedisClusterInfoCacheLockUtil util = new JedisClusterInfoCacheLockUtil(cluster);
// STEP lock
util.lockWrite();
// STEP Start a child thread to execute the get command
// (executing get in the same thread would re-enter, which doesn’t fit the
Executors.newFixedThreadPool(1).execute(() ->{
// At this point, execution will be indefinitely blocked.
cluster.get("test-key");
log.info("sub finish");
});
Thread.sleep(10000000);
util.unlockWrite();
log.info("main finish");
Execution result:
The child thread’s get command will remain stalled, waiting for the write lock to be released. Even if maxWaitMillis, connectionTimeout, soTimeout, and maxAttempts are configured, the operation will not trigger an interruption.
This leads to a multi-minute blocking delay.
Expected behavior
The command timeout can be interrupted.
Actual behavior
When performing Jedis operations in the production environment, the system experiences lags lasting several minutes. After troubleshooting with jstack, we found that numerous threads enter a WAITING state after calling getSlotConnection(). Upon examining the source code for JedisClusterInfoCache, I noticed that this class uses a ReentrantReadWriteLock, leading me to suspect that a write lock is being held, which is causing prolonged read lock blocking. Based on this, I developed a tool to proactively acquire the write lock, as outlined below:
Then, execute the following demo:
Execution result: The child thread’s get command will remain stalled, waiting for the write lock to be released. Even if maxWaitMillis, connectionTimeout, soTimeout, and maxAttempts are configured, the operation will not trigger an interruption. This leads to a multi-minute blocking delay.
ENV