mna / redisc

A Go redis cluster client built on top of redigo.
BSD 3-Clause "New" or "Revised" License
227 stars 35 forks source link

The slot is UNWRITEABLE when migrate slot #18

Closed yuelng closed 5 years ago

yuelng commented 5 years ago

When migrate slot,the cluster return ‘ASK’ err,but I found the client not process this situation correctly, help wanted, thanks @mna

yuelng commented 5 years ago
    // forceDial doesn't require locking (immutable)
        conn, addr, err := cluster.getConnForSlot(re.NewSlot, rc.c.forceDial, readOnly)
        if err != nil {
            // could not get connection to that node, return that error
            return nil, err
        }

https://github.com/mna/redisc/blob/master/retry_conn.go#L86 should replace by?

conn, err := cluster.getConnForAddr(re.Addr, rc.c.forceDial)
mna commented 5 years ago

Please clarify the issue, at a minimum it should clearly state (with minimal code to reproduce):

From what I understand of your issue, you should make sure you're using a RetryConn if you want to automatically follow the ASK responses, see https://godoc.org/github.com/mna/redisc#hdr-Redirections.

yuelng commented 5 years ago

thanks your response,absolutely,I am using a RetryConn,I found When migrate slot,the cluster return ‘ASK’ err,the client should take the redirection addr returned by err response,but from the client logic

conn, addr, err := cluster.getConnForSlot(re.NewSlot, rc.c.forceDial, readOnly)
if err != nil {
    // could not get connection to that node, return that error
    return nil, err
}

you still take the address from the cache mapping,this is work for MOVED err response, but the retry logic take the wrong address repeatedly,So I suggest that you replace the above code,with this

conn, err := cluster.getConnForAddr(re.Addr, rc.c.forceDial)

for example:the slot 8903 is mapping the node 10.1.171.237:7008, when migrate slot 8903 to node 10.1.171.238,we have operate on this slot, then the client will got err (error) ASK 8903 10.1.171.238:7008, so the client should connect the node 10.1.171.238,not 10.1.171.237。but from the source code,you still use the mapping relation slot-address 8903 - 10.1.171.237。

mna commented 5 years ago

Thanks for the clarification - that makes sense, I'll try to find some time to look into that.