Open wizzard0 opened 9 years ago
Thanks for the suggestions. The limiting factor is that Serf still requires a fully connected network. So in the example, even if machine C is capable of speaking to machine B, it also needs a direct connectivity path to machine A in order for the network to be functional. Serf would definitely benefit from supporting a wider variety of network topologies, but would require a significant overhaul of much of the internals to be successful.
I think we’ve stared down this rabbit hole a number of times in the past, and just imagining the kinds of network topologies that would need to be supported and how it would affect the gossip rate and routing complexity (think tertiary or further disjoint networks) have always made us turn away. It would be awesome to be able to support this kind of thing reliably, but for now I don’t think we have plans to prioritize it.
Oh, I see, so you still require direct links even given that the gossip protocol is, in general, capable of relaying. I'm very aware of the problems it creates with the convergence properties, too.
Then... maybe it's possible to built an "overlay" node that connects to both clusters and relays queries/events? I'm fine with losing tags and member lists for the time being.
I see you've done something similar in Consul with the "LAN-serf" and "WAN-serf" clusters. But using the entire Consul seems to be a huge overkill for my needs.
Are there any specific requirements to follow so that the replies still flow back correctly, for example?
I think eventually we'd like to support running Serf on top of an overlay network which can mask the various network topologies. Currently it's not possible to do with Serf, even if you are willing to except a degraded mode of operation.
With Consul, we have custom application logic to bridge the two gossip rings together. You could always do something similar, but both of the gossip rings are still expected to be fully connected.
TLDR: I want to assemble a Serf cluster from machines on multiple disjoint networks, where some nodes can reach both networks.
Assume 2 networks, Net1: 10.10.1.0/24, Net2: 10.10.2.0/24, and 3 machines: A (10.10.1.1), B (10.10.1.2, 10.10.2.1), C (10.10.2.2)
Currently, when the Serf node from machine B reports its IP as 10.10.1.2, node from machine C from Net2 is unable to connect to it.
Even if I run 2 Serf nodes on machine B bound to different interfaces - they're unable to see each other.
Expected Behavior: Serf node accepts a set of IPs/ifaces, binds to all of them, announces all of them. Nodes try to connect to each IP, optionally starting from the IP that is on the same subnet, then default to the last one succeeded, until the node fails.
It is okay to require a node per network on the multihomed machine, if the broadcasts and queries will continue to flow to all networks.
It is also okay to require manual configuration of networks the node belongs to. (to prevent nodes from Net1 declaring all nodes from Net2 as failed)