mna / redisc

A Go redis cluster client built on top of redigo.
BSD 3-Clause "New" or "Revised" License
227 stars 35 forks source link

Clarification regarding startup nodes #11

Closed abraithwaite closed 6 years ago

abraithwaite commented 6 years ago

We're using elasticache with redis cluster mode enabled and I'm curious which nodes I should be passing in for the StartupNodes.

Here's the situation: AWS assigns one record which contains all the nodes in the cluster. Leaders and replicas alike.

12:46:23 $ dig +short <super-secret>.usw2.cache.amazonaws.com
10.57.88.13
10.57.148.73
10.57.3.211
10.57.74.34
10.57.2.144
10.57.159.224
10.57.28.193
10.57.151.204
10.57.157.217
10.57.75.158
10.57.1.36
10.57.95.50

Each node of course has it's own address, but of course the master nodes can vary over time as there are outages and failures. However, there are no records to represent the current set of master nodes.

Reading the code, I'm particularly confused about these few lines:

https://github.com/mna/redisc/blob/master/cluster.go#L67-L69

It appears to me that we get the cluster mappings as many times as there are masters, proceeding forward only if there's no error when connecting to said master. Would this not mean that we can pass a list of all nodes into StartupNodes relying on the failure mechanic to ensure that the correct mapping eventually gets populated?

Related to the above, would it make sense to resolve the list of A records for a particular DNS address and set the startup nodes to that list?

Thanks and great work on the package!! 💯🎉

mna commented 6 years ago

Hello Alan,

Thanks for raising this, I wasn't familiar with how AWS handles Redis cluster. I guess the source of the confusion is this comment that mentions that only master nodes should be specified:

// StartupNodes is the list of initial nodes that make up
// the cluster. The values are expected as "address:port"
// (e.g.: "127.0.0.1:6379"). Only master nodes should be
// specified.

That strikes me as odd and I can't remember why I added that. I don't think it's reasonable to expect the caller to know what are the current master nodes, and I can't see why specifying replicas wouldn't work. I'll give it a closer look and come back at you.

I took a look at AWS doc, and it seems like each node in the cluster would get its own endpoint? That's from that page: https://docs.aws.amazon.com/AmazonElastiCache/latest/UserGuide/Endpoints.html#Endpoints.Find.RedisCluster

That would seem more reliable to me than resolving the IP beforehand, if that's for a long-running process (e.g. in case AWS takes a node down then back up with a different IP).

Martin

mna commented 6 years ago

That would seem more reliable to me than resolving the IP beforehand, if that's for a long-running process (e.g. in case AWS takes a node down then back up with a different IP).

Hmm scratch that as Redis will return the IP addresses when requesting the CLUSTER SLOTS anyway, so yeah what you proposed would make sense.

mna commented 6 years ago

I added a test of a Cluster started with a replica address, it works fine as expected. I also removed the erroneous mention in StartupNodes documentation.

abraithwaite commented 6 years ago

Neat. Now that I understand the lib better, I think that should be sufficient. We're just passing in the primary endpoint with the list of 12 nodes in our cluster (4shard x 3way replication) and it's doing fine.

I think one issue I see is if it fails to dial the first result in the DNS response, but we can solve that with a custom dialer if it becomes a problem (I think).

Thanks!

mna commented 6 years ago

Cool, I'll close this issue but let me know if you think of anything that could be done to improve the support for the AWS scenario.