redis / go-redis

Redis Go client
https://redis.uptrace.dev
BSD 2-Clause "Simplified" License
19.61k stars 2.31k forks source link

Node is not MarkAsFailing when network error happens in connecting phrase or writing phrase for pipeline command #2960

Closed singular-seal closed 2 months ago

singular-seal commented 3 months ago

After fixing issue #2959 and change the route strategy to 'random', I still got some connection errors occasionally. My test case is below,

package redis

import (
    "context"
    "fmt"
    "testing"
    "time"
)

func ConnectRedisTarget() *ClusterClient {
    return NewClusterClient(&ClusterOptions{
        Addrs: []string{
            "127.0.0.1:6666",
        },
        RouteRandomly: true,
        //ReadOnly: true,
    })
}

func testOne(t *testing.T) {
    c := ConnectRedisTarget()
    ctx := context.Background()
    for i := 0; i < 10000; i++ {
        p := c.Pipeline()
        p.Get(ctx, "c")
        cs, err := p.Exec(ctx)
        if err != nil {
            fmt.Println("err from c " + fmt.Sprint(err))
        } else {
            for _, cmder := range cs {
                fmt.Println(cmder.String())
            }
        }
        p.Get(ctx, "a")
        cs, err = p.Exec(ctx)
        if err != nil {
            fmt.Println("err from a " + fmt.Sprint(err))
            fmt.Println(err)
        } else {
            for _, cmder := range cs {
                fmt.Println(cmder.String())
            }
        }
        time.Sleep(time.Second * 1)
    }
}

Expected Behavior

It always prints

err from c redis: nil
get a: 1

Current Behavior

It prints

err from a dial tcp 127.0.0.1:6666: connect: connection refused
dial tcp 127.0.0.1:6666: connect: connection refused

occasionally and the probability is about 0.1.

Possible Solution

For pipeline commands, go-redis only mark connection as failing during reading phrase but in most cases a connection error is returned during connecting phrase, so we should add the MarkAsFailing logic to all these phrases.

Steps to Reproduce

  1. Add a patch fix for issue #2959
  2. Setup a redis cluster with one master and one slave and run 'set a=1' to set value for key 'a'
  3. Run the testing code and it will print the expected messages
  4. Shutdown the master node
  5. You can see the error message occasionally

Context (Environment)

Detailed Description

Possible Implementation