mna / redisc

A Go redis cluster client built on top of redigo.
BSD 3-Clause "New" or "Revised" License
228 stars 35 forks source link

Handle redis cluster via redis load balancer URL #13

Closed zhangruiskyline closed 6 years ago

zhangruiskyline commented 6 years ago

Hi

I am using Microsoft Azure redis and it dose not provide each cluster node individual IP address, rather it will only provide the redis cluster URL(which is actually a load balancer, not real redis nodes). Can this lib handle this? if so, I should put this redis URL in startupNodes or in dialoption's address?

Thanks Rui

mna commented 6 years ago

Hello Rui,

I'm not familiar with Azure's redis cluster support, but reading this page: https://docs.microsoft.com/en-us/azure/redis-cache/cache-how-to-premium-clustering , it looks like it should work. They do mention using a redis client with cluster support, so that means it should expose the necessary commands to query a cluster. I'm a bit concerned by the Can I directly connect to the individual shards of my cache? question in the FAQ, where they say this is not officially supported, because that's basically what a redis cluster client must do (e.g. to follow MOVE replies, direct the commands to the right node, etc.).

If you do try it out, I'd be very interested in hearing about your experience, what worked, what didn't and how I could update the package to better support azure!

Thanks, Martin

zhangruiskyline commented 6 years ago

Thanks Martin @mna

I have tried and it seems working. although connect to individual shard is not officially support, but I can check the redis node's IP via cluster nodes and get back information.

so in cluster setup, I use all internal nodes' address in "StartupNodes" and external URL for dial. One follow up questions, Azure Redis support master/slave mode and master and slave will have different ports, should I put master and slave in StartupNodes or only need to put master node address/port is enough?

Also Dose this redisc follow MOVE command and redirect?

Thanks Rui

mna commented 6 years ago

Hello,

You can put either masters or replicas in StartupNodes, the client will automatically figure out the topology based on cluster nodes responses. I'm not sure what you're doing is correct for Dial, though. If I understand correctly, you use a redigo.DialNetDial option to specify a dial function that ignores the address it receives as parameter and always uses the external URL instead?

If so, I believe that would defeat the goal of using redisc. The dial function should always try to connect to the address provided, that's the one the client identified as the right address to connect to the required node.

Regarding your last question, to follow MOVE replies, redisc gives you complete control - you can either use a basic connection that will return a RedirError for MOVE replies, or you can wrap that basic connection in a RetryConn that will automatically follow those redirections (see https://godoc.org/github.com/mna/redisc#RetryConn).

Hope this helps! Martin

zhangruiskyline commented 6 years ago

Hi Martin @mna

regarding "The dial function should always try to connect to the address provided", you mean it will automatically fetch {IP:port} from StartupNodesand connect?

I define a CreatePool func which will be put like

        cluster := &redisc.Cluster{
            StartupNodes: redisClusterStartupNode,
            DialOptions:  redisDialOptions,
            CreatePool:   createPool,
        }
func createPool(addr string, opts ...redis.DialOption) (*redis.Pool, error) {
    return &redis.Pool{
        MaxActive:   redisMaxActive,
        MaxIdle:     redisMaxIdle,
        IdleTimeout: time.Duration(redisIdleTimeout) * time.Second,
        Dial: func() (redis.Conn, error) {
            c, err := redis.Dial("tcp", redisURL, opts...)

            if err != nil {
                log.Errorf(err.Error())
                return nil, err
            }
            return c, nil
        },
    }, nil
}

but I indeed find can not pass the into createPoolso use some hack solution to directly use redisURL in createPool , is that not what we should do? what do you suggest the best way to use this lib for cluster connection purpose

Thanks Rui

mna commented 6 years ago

Ok, thanks for the details, the Dial function in createPool should try to connect to the addr parameter passed to createPool, so that it returns connections connected to the right node.

zhangruiskyline commented 6 years ago

@mna ,in my case, The Addrshould be load balancer URL for Azure redis, not startupnodes IP addresses for real redis node?

Thanks Rui

zhangruiskyline commented 6 years ago

@mna , let me refine my question in this way. what is the difference between StartupNodesin Cluster data structure, and addrin dail function of a connection pool? should they be same? or different?

Thanks Rui

mna commented 6 years ago

Okay so here's how the redis cluster works. The startup nodes are used to build the topology of the cluster, figure out the masters and replicas. This may change during the execution of your app - e.g. maybe a node will go down, another one will replace it, etc. So that's why those are called "startup" nodes, those are the nodes to use when the app (server, typically) starts, but after that it may change. The client stays up-to-date with those changes.

Another thing that the client does is keeping track of which node in the cluster serves which key hash. There are 16384 hash values, and each hash value has a corresponding node. When you get a connection from the cluster and you execute a command, the client will identify which node should be contacted to execute this command based on the key passed to the command (e.g. GET a will contact the node that serves the hash of key "a").

Now, about the addr that is passed to the Cluster.CreatePool function. When you try to get the value of key "a", the Cluster will try to get a connection from the corresponding node's pool. If there is no pool yet for this node, it will call the Cluster.CreatePool function to create a pool that manages connections to that specific node. So as you can see, it is important that the Dial function of this pool really returns connection to that specific node, not the generic load balancer URL, otherwise you lose a lot of the optimization of the redis cluster client (the "smart routing" part).

In a stable cluster - typically, the huge majority of the time - the client will always select the right node to contact, so you won't receive MOVE replies and will avoid having to do multiple network calls to get the response from the correct node.

In other words (in code), you should do this:

func createPool(addr string, opts ...redis.DialOption) (*redis.Pool, error) {
    return &redis.Pool{
                // other fields omitted...
        Dial: func() (redis.Conn, error) {
            c, err := redis.Dial("tcp", addr, opts...) // <<-- IMPORTANT! use "addr" here!
            if err != nil {
                log.Errorf(err.Error())
                return nil, err
            }
            return c, nil
        },
    }, nil
}

Hope that makes it clearer! Martin

zhangruiskyline commented 6 years ago

Thanks @mna for your detailed information.

follow up on " it is important that the Dial function of this pool really returns connection to that specific node, not the generic load balancer URL". for cluster case, I will have bunch of different {IP:ports}of different redis nodes, which one should I use to pass into Addrfield?

Right now, in Azure redis, In createPoolI always Dial in external URL(which is actually a load balancer in Azure redis case), and in StartupNodes in list all {IP:port}s(I got theseIP:port by issuing cluster nodes in redis command line). And looks like it worked

but according to your suggestion, this sounds to be a Hack. not recommended. But I can not find another way, since Azure want to hide all cluster information and only expose one external URL to outside, and let redis client to manage all cluster/pool itself.

Can you provide a sample code what is your recommend way to implement the cluster init stage?

Thanks Rui

mna commented 6 years ago

Hm I think I see where the confusion comes from, you never call createCuster directly yourself, redisc will call it when needed, with the right address as parameter. You can take a look at example_test.go in the redisc package for some example usage.

In your code, you simply call Custer.Get() to retrieve a pooled connection, e.g.:

    // create the cluster
    cluster := redisc.Cluster{
        StartupNodes: ..., // your startup nodes
        DialOptions: ..., // whatever dial options you want to use
        CreatePool:   createPool,
    }
    defer cluster.Close()

    // initialize its mapping before use, recommended
    if err := cluster.Refresh(); err != nil {
        // handle error
    }

        // And then elsewhere, when you need a connection
    // grab a connection from the pool
    conn := cluster.Get()
    defer conn.Close()

So you never actually call createPool.

Hope this helps! Martin

zhangruiskyline commented 6 years ago

Thanks @mna , If redisc calls the createPool instead of myself handling it, then where should I pass the redis URL at starting point? Another question, is it possible for redisc to handle all internal nodes IP:port without specifying them in startupNodes section(this seems to be what Azure redis recommend, so application should not know internal redis node's IP)

Thanks Rui

zhangruiskyline commented 6 years ago

FYI, @mna , I have contacted Azure redis dev team,

and their response is like:

Most Redis clients that understand clustering do the following:

  1. Connect to the host name for the cache using port 6379 (non-ssl) or 6380 (SSL).
  2. Ask Redis if it is clustered. If so, ask for the configuration of the cluster. This includes the IP Addresses and ports of all nodes (masters and slaves) in the cluster. This also includes the slot assignments handled by each node.
  3. When getting or setting a key from redis, the client will calculate the hash of the key, then lookup which node owns the slot that hash, then send the request to the specific node for that key.

Another dev points out:

Normally, you can just put redis URL as startupNode. But you should be aware that our redis URL is actually a load balancer, rather than a real node. Whether this will cause issues depend on concrete implementation. As I know, there is one issue caused by passing load balancer as startup node in Java Lettuce client. You can refer this. https://github.com/lettuce-io/lettuce-core/issues/712

So right now it seems to me only the hack way works in Azure redis

  1. to put redisURL in createPool addr field
  2. use "cluster nodes" in redis command and get back {ip:port}, then assign these {ip:port} into StartupNodes,

I am not sure whether this is best what I can do with redisc, or some other piece I should improve. please let me know your comments, that will be great helpful

Thanks Rui

mna commented 6 years ago

Hello,

Yep, that's pretty much what a redis cluster client will do. It's a good recommendation to pass the redis URL as startup node, as it probably ensures that it can connect to a valid node in the cluster, and then redisc will discover the actual addresses of each active node.

However, in createPool, as I said, you have to pass the addr argument you received, as-is, not the redis URL. That's for the point 3 that the azure devs told you about:

When getting or setting a key from redis, the client will calculate the hash of the key, then lookup which node owns the slot that hash, then send the request to the specific node for that key.

So the createPool has to create a connection to the specific node that it requested in the addr parameter.

Martin

zhangruiskyline commented 6 years ago

Thanks for reply @mna , but it dose not work in my case, I always get "redisc: failed to get a connection",here is my connection type

        cluster := &redisc.Cluster{
            StartupNodes: redisClusterStartupNode,  //here I pass: "xxx..redis.cache.windows.net:6380" as startup nodes
            DialOptions: redisDialOptions,
            CreatePool: createPool,
        }

and in createPool I use what you reommended

func createPool(addr string, opts ...redis.DialOption) (*redis.Pool, error) {
    return &redis.Pool{
        MaxActive:   redisMaxActive,
        MaxIdle:     redisMaxIdle,
        IdleTimeout: time.Duration(redisIdleTimeout) * time.Second,
        Dial: func() (redis.Conn, error) {
            c, err := redis.Dial("tcp", addr, opts...)

            if err != nil {
                log.Errorf(err.Error())
                return nil, err
            }
            return c, nil
        },
    }, nil
}

is there any way I can completely get rid of startup nodes or any internal information, just provide the external URL(Load Balancer URL) and make it work? I guess it is because redisc need to have a real redis node as startupnodes? while I provide only load balancer?

Thanks Rui

mna commented 6 years ago

Where is your app running from? Does it have access to the internal nodes (IP address and port, not the entrypoint URL of the load balancer)? E.g. if you try to connect from one of the internal node addresses from the server where the app runs using redis-cli, can you successfully connect and run commands?

My guess is that your app doesn't have access to the cluster nodes' addresses from where it runs.

zhangruiskyline commented 6 years ago

I am running it in my local machine, but it should have cert and SSL configured, I am able to connect to redisURL:6380. and the hack works(put this redisURL:6380 in creatPool addr)

but when putting redisURL: into startupNodes, it dose not work, I try both 6380 or 1500x ports, neither work

Thanks Rui

zhangruiskyline commented 6 years ago

Hi @mna

some more update on debug: if I only put redisURL:6380in startupnode, and use addr in createPool, seems to me the correct redis node address is passed(according to debug), but somehow dial function can not establish the connection due to SSL handle error.

I tried two different ways:

  1. Put the auth in createpool:
func createPool(addr string, opts ...redis.DialOption) (*redis.Pool, error) {
    return &redis.Pool{
        MaxActive:   redisMaxActive,
        MaxIdle:     redisMaxIdle,
        IdleTimeout: time.Duration(redisIdleTimeout) * time.Second,
        Dial: func() (redis.Conn, error) {
            c, err := redis.Dial("tcp", addr, opts...)

            if err != nil {
                log.Errorf(err.Error())
                return nil, err
            }
                          //Auth here in Dial, which is what I did for redigo and worked
            if _, err := c.Do("AUTH", redisAuth); err != nil {
                log.Errorf(err.Error())
                c.Close()
                return nil, err
            }
            return c, nil
        },
    }, nil
}
  1. Another one is use redis.DialPassword(redisAuth) in dial option,

but both returns "redisc: failed to get a connection", Not sure what is best way I should dial in SSL case for cluster.

It looks like to me problem is:

When redisc first establish SSL connection, it will call redisURL:6380, which is a load balancer as indicated before. so it can establish security connections.

but later when it tries to find other nodes in redis cluster to connect, and when createpool try to establish the SSL connection directly to new redis node, it fails. probably all security handle is on load balancer side for Azure redis, rather than each cluster node level.

Is there any solution to this?

Thanks Rui

zhangruiskyline commented 6 years ago

Update again,

if I disable the Azure redis cache SSL and expose none SSL port, only use PWD to Auth, it works!! so looks like my former assumption of SSL auth problem is correct

Thanks Rui

mna commented 6 years ago

This is handled by redigo (used by redisc to manage connections and pools). See https://godoc.org/github.com/garyburd/redigo/redis#DialTLSConfig and https://godoc.org/github.com/garyburd/redigo/redis#DialUseTLS. You should specify those DialOptions on Cluster.DialOptions. On Sat, Apr 14, 2018 at 03:52 Rui Zhang notifications@github.com wrote:

Update again,

if I disable the Azure redis cache SSL and expose none SSL port, only use Passport to Auth, it works! so looks like my former assumption of SSL auth problem is correct

Thanks Rui

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/mna/redisc/issues/13#issuecomment-381311431, or mute the thread https://github.com/notifications/unsubscribe-auth/AAqQoE9sRn4nohIiVM-m3PvFWAmJVA8cks5toarHgaJpZM4TQZqn .

zhangruiskyline commented 6 years ago

Yes, I am aware of using redigo to do SSL, but the question I have is when we createpool to some other redis node rather than Loadbalancer(redisURL:6380), it seems redisc will try to create same SSL session pool directly to redis node, but will be rejected since only load balancer is able to handle the SSL stuff. even if I change redisTLSconfig.ServerName = redisURL:6380 in createpool, it dose not work.

any better way I can force SSL connection to load balancer instead of addr field when I try to createpool?

Thanks Rui

mna commented 6 years ago

Ah I see, yeah so you don't need SSL in createPool on the internal nodes, only for the load balancer URL... There's no obvious way to do this, one thing you could do is not set the DialTLSConfig and DialUseTLS on Cluster.DialOptions, and only set it in createPool when the addr is the load balancer's address, e.g.:

func createPool(addr string, opts ...redis.DialOption) (*redis.Pool, error) {
        if addr == redisURL { // or redisURL + ":6380", don't know exactly what you've set redisURL to
            opts = append(opts, redis.DialUseTLS(true)) 
            opts = append(opts, redis.DialTLSConfig(...))
        }
    return &redis.Pool{
                // ...
        Dial: func() (redis.Conn, error) {
            c, err := redis.Dial("tcp", addr, opts...)
                        // ...
        }
}
zhangruiskyline commented 6 years ago

I think this dose not work also, when redisc dial into some other redis nodes. it still need to go to SSL connection, juts it won't SSL directly to redis node, but load balancer. in your code, it will not establish the SSL at all

mna commented 6 years ago

I'm afraid I don't understand what is your issue then. You either need TLS and thus have it configured in the dial options, or you don't and thus you set it on/off in the dial options when needed. I'm not familiar with Azure's redis, but I'd expect the external URL to require TLS config so that the endpoint is not publicly accessible from outside the VPN, but the internal nodes to be accessed from inside the VPN (where your binary should run) without TLS.

zhangruiskyline commented 6 years ago

Let me double check again. My question is when dial into some redis cluster node, we still need SSL, but not SSL directly to node, rather it needs to dial to Redis load balancer URL. In your former approach, it will directly dial into redis node without SSL, right?

zhangruiskyline commented 6 years ago

Hi @mna

I think I got some feedback from Azure redis team, so client can directly connect to redis nodes(not necessarily load Balancer), but in Azure redis, different nodes and load balancer shares same IP, but different ports.(via a NAT way)

but in dial, seems we split the ports, so we lost port in SSL creation? anyway, it could be a question for redigo.

        if tlsConfig.ServerName == "" {
            host, _, err := net.SplitHostPort(address)
            if err != nil {
                netConn.Close()
                return nil, err
            }
            tlsConfig.ServerName = host
        }

Thanks Rui

zhangruiskyline commented 6 years ago

I think the problem is that Azure redis uses NAT in cluster mode, so every node has same IP but different ports, somehow redisc can not bind rc, _, err := c.bind(cmdSlot(cmd, args)), find some strange slot and can not find correct node.

investigating more, but if you have some insight, please share

Thanks Rui

zhangruiskyline commented 6 years ago

One question, the createPool will be called only in startup stage? basically if I only provide load balancer node address as startupNodes option, will it run "cluster nodes" command in redis cache and get back all nodes and create SSLs for all of them at starting point?

Thanks Rui

mna commented 6 years ago

No, createPool is created whenever a pool is needed for one node (i.e. at least once per node in the cluster + startup nodes).

zhangruiskyline commented 6 years ago

@mna

I got some feedback again from Azure redis team about the recommendation how the cluster client should work.

Quote as:

I am not sure whether redisc did same? If not, how can I use redisc to minic the Azure redis behavior?

Thanks Rui

zhangruiskyline commented 6 years ago

OK, finally I got the client works in SSL mode. here is configuration

in StartupNodes, only put the redisURL:6380(Loadbalancer URL), in dial option, use redis.DialUseTLS(true), and redis.DialTLSSkipVerify(true), The key point put redis.DialTLSSkipVerify(true), I am still not 100% sure why put this works.

Thanks Rui

mna commented 6 years ago

Ok, looks like you don't have a trusted certificate. Glad you got it working, closing as this is a connection configuration issue.