hashicorp / consul

Consul is a distributed, highly available, and data center aware solution to connect and configure applications across dynamic, distributed infrastructure.
https://www.consul.io
Other
28.26k stars 4.41k forks source link

Isolated WAN pools #882

Closed jsternberg closed 8 years ago

jsternberg commented 9 years ago

We've been using Consul decently successfully for pulling together our infrastructure. Unfortunately, the infrastructure we have has a lot of isolation and certain sections aren't able to talk to others (by design). We've found this leads to a problem with the gossip protocol.

Consider the following infrastructure where there are 3 subnets: A, B, and C. A and B can communicate along with A and C, but B and C are isolated from each other and cannot communicate.

When this is the case, C seems to report that B is dead to A. A then boots B out, but then B rejoins by himself after being told he's kicked out. This leads to intermittent network issues that rely on forwarding datacenters.

What I'd like it a way to isolate the WAN pools. I want A and B to form their own unique WAN pool. If I ask A about B, then he'll forward the request. If I ask B about A, he'll forward the request. If I ask B about C, he won't know what I'm talking about. A never communicates any information about C to B.

With this configuration, it would be possible to have a central hub in A that can communicate with all of them, but doesn't mix them.

Maybe something like consul join -wan -isolate A.

armon commented 9 years ago

This is an issue we are aware of, but presents a number of complex challenges. Our gossip protocol assumes a fully connected network, so more complex setups (hub and spoke, chained, etc) cause issues like this. We are trying to think of better ways to support it, but almost all will require a deep change to the underlying protocols.

jsternberg commented 9 years ago

Is this a problem with the Serf library or with the underlying Raft protocol? I'd like to take a look at it myself if possible, although I can't dedicate too much time to it.

ryanuber commented 9 years ago

@jsternberg this involves both serf and the even lower-level memberlist package. As @armon mentioned the changes required will be deep and involved, so before starting any work it would be best to start brainstorming in a Google Docs document about the design, as there are certainly a very large number of considerations to make here.

jsternberg commented 9 years ago

This feature has become essential for something that we want to do in our infrastructure where previously we were able to work around it. I've been exploring the memberlist code and I think I have an idea about the design.

For the Google Doc in exploring the solution for this, who should I make sure to include when I share the document?

armon commented 9 years ago

Please include me (@armon), @ryanuber and @slackpad.

slackpad commented 8 years ago

Closing this out as a duplicate of the newer #1871.