redis / lettuce

Advanced Java Redis client for thread-safe sync, async, and reactive usage. Supports Cluster, Sentinel, Pipelining, and codecs.
https://lettuce.io
MIT License
5.35k stars 959 forks source link

Huge variation in key lookup times against redis cluster #2436

Closed menacher closed 3 weeks ago

menacher commented 1 year ago

Bug Report

I see a huge variation for looking up a key. Some timings(in milliseconds) are as follows 9790ms, 28617ms,21474ms,5104ms,37ms,10ms,1ms,97ms,12957ms,1868ms

Current Behavior

Stack trace

Input Code

Input Code ```java RedisAdvancedClusterCommands readSync = clusterConnection.sync(); String value = readSync.get(key); ```

Expected behavior/code

Environment

Possible Solution

Additional context

I tried the same with Redisson library and it is very consistently fetching the key in less than 5ms. This seems to happen only with cluster connection.

mp911de commented 1 year ago

Can you provide a debug/trace log from GET where it takes 20 seconds?

menacher commented 1 year ago

Hi, I have some limitations on setting up the tracing in the server I tested in. So, I created a local example where the Redis cluster is run locally in docker. This does not show the timing variation as much as the server but still I could see times from 1ms to 100's of ms. I have added brave tracing library, let me know if I have set it up right in the file KeycacheConfig.java

I am attaching the example program I used to test this out. keycache.zip

Usage is as follows:.

  1. To execute program run ./redeploy.sh in the root folder.
  2. Set a timeout curl -XPOST http://localhost:8080/timeout/5000 note this is in nano seconds, so 5000 is actually 5 ms.
  3. Create a key curl -XPOST http://localhost:8080 -H 'Content-Type: application/json' --data-raw '{"key": "cluster", "value":"works"}'
  4. Read back the key http://localhost:8080\?key\=cluster
  5. Run a basic load test for i in {1..500}; do ((curl http://localhost:8080\?key\=cluster)&); done
github-actions[bot] commented 1 month ago

If you would like us to look at this issue, please provide the requested information. If the information is not provided within the next 30 days this issue will be closed.

tishun commented 1 month ago

Hey @menacher ,

thanks for taking the time to put up an example, but I am afraid that 1ms to 100's of ms replies do not seem overly excessive and are - more or less - highly dependant on the hardware and configuration you have. @mp911de requested the debug/trace log for the 20sec command, because 20000ms is obviously an excessive amount of time for a simple data fetch. In order for us to pinpoint a potential issue we need to have a better way to profile the problem you are seeing as there are many components along the way that could be causing this issue.