aws elasticache and radix

georlav commented 5 years ago

Trying to use radix with AWS elasticache in cluster mode. Setup requires a tls connection and a Secret. The client looks like it connects and works, but i am constantly getting errors when i try to SET a new key, value. The error i am getting is : cluster action redirected too many times

Here is a code example i use. Radix version i am using is v3 v3.3.0 Tried with redis versions 5.0.0 and 5.0.4

cfg := redis.ClusterOpts{
        Protocol: "tcp",
        Address:  []string{"aws.cluster.host"},
        Port:     6379,
        Secret:   "mysecret",
        PoolSize: 10,
        PemFile:  "./tls-ca-bundle.pem",
    }

    customConnFunc := func(network, addr string) (radix.Conn, error) {
        // Set up tls config
        tlsCfg, err := getTLSConfig(cfg.PemFile)
        if err != nil {
            return nil, err
        }

        // Set up connection using default dialer
        conn, err := tls.Dial(network, addr, tlsCfg)
        if err != nil {
            return nil, err
        }

        c := radix.NewConn(conn)
        if err := radix.Cmd(nil, "AUTH", cfg.Secret).Run(c); err != nil {
            cErr := c.Close()
            if cErr != nil {
                return nil, cErr
            }
            return nil, err
        }

        return c, nil
    }

    poolFunc := func(network, addr string) (radix.Client, error) {
        return radix.NewPool(
            cfg.Protocol,
            fmt.Sprintf(`%s:%d`, cfg.Address[0], cfg.Port),
            cfg.PoolSize,
            radix.PoolConnFunc(customConnFunc),
        )
    }

    cluster, err := radix.NewCluster(cfg.Address, radix.ClusterPoolFunc(poolFunc))
    if err != nil {
        log.Fatal("Cluster error ", err)
    }

    if err := cluster.Do(radix.Cmd(nil, "SET", "key1", string("some val for key1"))); err != nil {
        log.Fatal(err)
    }

    var entry string
    if err := cluster.Do(radix.Cmd(&entry, "GET", "key1")); err != nil {
        log.Fatal(err)
    }

    fmt.Println(entry)

Am i doing something wrong ? Does radix work fine with AWS elasticache ?

mediocregopher commented 5 years ago

Hi @georlav ! Your code looks fine to me. The only thing I would note is that instead of using NewConn and doing the tls and AUTH manually, you could use Dial with DialAuthPass and DialUseTLS. But that shouldn't affect the problem you're having.

cluster action redirected too many timescluster action redirected too many times would only be returned if radix is getting back MOVED or ASK errors; the fact that your cluster is returning them so much indicates that the cluster is in some strange state and your connection logic is likely fine.

I haven't personally used elasticache so I can't give much specific advice, but it might be worth double checking with something like redis-cli --cluster check that the cluster is actually healthy. If it is then we'll have to figure out some way to further debug.

georlav commented 5 years ago

Hi @mediocregopher, I do not think that there is an issue with the cluster. I m using redis desktop manager client with no issues, also after reaching a dead end, used another go client with cluster support for redis and everything works as expected. (with exact same settings)

ps : I see you have a hardcoded value for the limit of retries const doAttempts = 5, it would be great to be able to control that value, even if that not the issue, i tried with less attempts in other client and it worked again as expected.

nussjustin commented 5 years ago

The master branch has an experimental trace sub-package which can be used via the ClusterWithTrace method. Using this you could check what kind of redirect you are getting (-ASK or -MOVED) and whether the cluster is syncing probably.

I tried just to set up an Elasticcache cluster myself, but it seems like AWS decided that my account should not be valid anymore, so I can't test this by myself right now.

mediocregopher commented 5 years ago

I see you have a hardcoded value for the limit of retries const doAttempts = 5, it would be great to be able to control that value, even if that not the issue, i tried with less attempts in other client and it worked again as expected.

If that value is not an issue then why should it need to be controlled? :P

I also spent a good hour trying to set up an elasticcache cluster only to get completely lost in amazon's insane VPC networking rules and gave up. I might try again in the near future.

If you could use the trace package to find out the exact redirect errors which are happening, and post that along with the response from CLUSTER SLOTS or CLUSTER INFO I think that'd be helpful, so we can maybe understand what is happening.

mediocregopher commented 5 years ago

@georlav have you looked at this again at all? I've tried at various times to try again but I guess I'm just dumb and can't figure out AWS anymore (it used to be so easy!). But If you could use the tracing feature it might help us narrow down the problem.

mediocregopher commented 5 years ago

Gonna go ahead and close this for now, @georlav if you're still having problems feel free to open it back up and we can help you diagnose further.

mediocregopher / radix

aws elasticache and radix #132