redis / jedis

Redis Java client
MIT License
11.87k stars 3.87k forks source link

JedisCluster Requests Hang Indefinitely After Lock, Ignoring Timeout Configurations #4002

Open Nacol-174 opened 1 week ago

Nacol-174 commented 1 week ago

Expected behavior

The command timeout can be interrupted.

Actual behavior

When performing Jedis operations in the production environment, the system experiences lags lasting several minutes. After troubleshooting with jstack, we found that numerous threads enter a WAITING state after calling getSlotConnection(). Upon examining the source code for JedisClusterInfoCache, I noticed that this class uses a ReentrantReadWriteLock, leading me to suspect that a write lock is being held, which is causing prolonged read lock blocking. Based on this, I developed a tool to proactively acquire the write lock, as outlined below:

package com.nacol.redisbandwidth.component.cache;

import redis.clients.jedis.*;

import java.lang.reflect.Field;
import java.util.concurrent.locks.Lock;

public class JedisClusterInfoCacheLockUtil {

    private final Lock writeLock;

    private final Lock readLock;

    public JedisClusterInfoCacheLockUtil(JedisCluster jedisCluster) throws Exception {

        Field connectionHandlerField = BinaryJedisCluster.class.getDeclaredField("connectionHandler");
        connectionHandlerField.setAccessible(true);
        JedisClusterConnectionHandler connectionHandler = (JedisClusterConnectionHandler) connectionHandlerField.get(jedisCluster);

        Field cacheField = JedisClusterConnectionHandler.class.getDeclaredField("cache");
        cacheField.setAccessible(true);
        JedisClusterInfoCache jedisClusterInfoCache = (JedisClusterInfoCache) cacheField.get(connectionHandler);

        Field writeLockField = JedisClusterInfoCache.class.getDeclaredField("w");
        writeLockField.setAccessible(true);
        this.writeLock = (Lock) writeLockField.get(jedisClusterInfoCache);

        Field readLockField = JedisClusterInfoCache.class.getDeclaredField("r");
        readLockField.setAccessible(true);
        this.readLock = (Lock) readLockField.get(jedisClusterInfoCache);
    }

    public void lockWrite() {
        writeLock.lock();
    }

    public void unlockWrite() {
        writeLock.unlock();
    }

    public void lockRead() {
        readLock.lock();
    }

    public void unlockRead() {
        readLock.unlock();
    }

}

Then, execute the following demo:

1.  Initialize JedisCluster.
2.  Acquire the write lock.
3.  Start a child thread to execute the get command (executing get in the same thread would re-enter, which doesn’t fit the scenario).
        // STEP init Clsuter
        JedisCluster cluster = JedisClient.getCluster();

        // STEP init clock util
        JedisClusterInfoCacheLockUtil util = new JedisClusterInfoCacheLockUtil(cluster);

        // STEP lock
        util.lockWrite();

        // STEP Start a child thread to execute the get command 
        //    (executing get in the same thread would re-enter, which doesn’t fit the 
        Executors.newFixedThreadPool(1).execute(() ->{
            // At this point, execution will be indefinitely blocked.
            cluster.get("test-key");
            log.info("sub finish");
        });

        Thread.sleep(10000000);
        util.unlockWrite();

        log.info("main finish");

Execution result: The child thread’s get command will remain stalled, waiting for the write lock to be released. Even if maxWaitMillis, connectionTimeout, soTimeout, and maxAttempts are configured, the operation will not trigger an interruption. This leads to a multi-minute blocking delay.

ENV

sazzad16 commented 1 week ago

@Nacol-174 Thank you for your work and sharing.

Nacol-174 commented 1 week ago

@Nacol-174 Thank you for your work and sharing.

This issue occurs across Jedis versions 3, 4, and 5.