redis / go-redis

Redis Go client
https://redis.uptrace.dev
BSD 2-Clause "Simplified" License
19.81k stars 2.34k forks source link

Why is there a growing latency when getting values from redis using go-redis? #2823

Open abriejenny opened 8 months ago

abriejenny commented 8 months ago

Expected Behavior

If i get a value of the same size (eg 100kb) from redis concurrently with eg 500 goroutines, I would expect each get to take relatively the same amount of time (eg 50ms).

Current Behavior

The latency is consistantly increasing the more concurrent gets I try to do.

Furthermore, if you run multiple instances of the script (on the same client), you see the latency increasing faster. So each additional client is slowing down all other clients.

This should not happen at this small scale on redis right? From redis-benchmark It can handle 100k connections at a huge throughput. I am doing 1-5 connections if im testing with 1-5 clients.

If I set the for loop count high enough in the replication script below and have 3 clients, sometimes a request takes multiple seconds...

Steps to Reproduce

  1. Run a local instance of redis (happens for both ive tested, 6.2 and 7.2)
  2. Set up the example script. a) mkdir testit && cd testit b) go mod init my/test c) go get github.com/redis/go-redis/v9 d) add the script script.go
    
    package main
    import (
    "fmt"
    "context"
    "time"
    "strings"
    "github.com/redis/go-redis/v9"
    )

var ctx = context.Background()

func main() { // Create a new Redis client rdb := redis.NewClient(&redis.Options{ Addr: "localhost:6379", DB: 5, }) // Store a 100k value in Redis key := "some:test:key" value := strings.Repeat("a", 100000) err := rdb.Set(ctx, key, value, 0).Err() if err != nil { fmt.Println("Failed to set value:", err) return } // Get the value asynchronously 1000 times, log the time it takes for each for i := 0; i < 1000; i++ { t1 := time.Now() go func(i int) { val, err := rdb.Get(ctx, key).Result() if err != nil { fmt.Println("Failed to get value:", err) return } t2 := time.Now() fmt.Printf("%d: time: %vms, val size: %d\n", i, t2.Sub(t1).Milliseconds(), len(val)) }(i) } // Wait for a while to ensure all goroutines have finished time.Sleep(2 * time.Second) }

  e) `go mod tidy`
   f) `go run .`

4. cry
5. Test it with multiple scripts running, I do this by sleeping until time is multiple of 10 seconds and then executing on eg 3 scripts.

EXAMPLE RESULTS

0: time: 0ms, val size: 100000 1: time: 37ms, val size: 100000 12: time: 37ms, val size: 100000 13: time: 37ms, val size: 100000 38: time: 37ms, val size: 100000 133: time: 97ms, val size: 100000 145: time: 97ms, val size: 100000 77: time: 97ms, val size: 100000 . . . 979: time: 235ms, val size: 100000 914: time: 236ms, val size: 100000 957: time: 232ms, val size: 100000 980: time: 235ms, val size: 100000 958: time: 235ms, val size: 100000 954: time: 236ms, val size: 100000 913: time: 236ms, val size: 100000 921: time: 236ms, val size: 100000 961: time: 236ms, val size: 100000 976: time: 236ms, val size: 100000


## Context (Environment)

Without going into too much detail, this high load only happens intermittently on our production servers. We need to be able to have 100 clients, all reading values concurrently (at least 30 each) and consistantly (over eg 30mins)  from redis that can update at any time. This is working fine when the values are smallish, but the larger they get, the more relevant this problem becomes.
sgp commented 4 weeks ago

I think I can explain the test code's performance as due to the interaction between goroutine scheduling and connection pooling.

MaxPoolSize is, by default, the number of CPUs * 10, so the extra latency in the early requests seem to be in establishing those connections, which is built into the rdb.Get calls when it first connects. The later ones seem to be close together in terms of latency.

Another thing that might explain this is that because goroutines yield on I/O calls like io.Read, it's possible that when you have 1000 outstanding goroutines, you are switching goroutines more often for large results because 100k may not fit in the buffer. This is why goroutines, while cheap are not free.

I'd be interested in the results of running this test and timing the code inside the goroutine itself rather than measuring from the time of goroutine creation.