hashicorp / nomad

Nomad is an easy-to-use, flexible, and performant workload orchestrator that can deploy a mix of microservice, batch, containerized, and non-containerized applications. Nomad is easy to operate and scale and has native Consul and Vault integrations.
https://www.nomadproject.io/
Other
14.83k stars 1.95k forks source link

Support separate RPC interfaces for multiple regions #1484

Open jshaw86 opened 8 years ago

jshaw86 commented 8 years ago

Issue: There is only one interface for RPC currently. The problem is that in a multi DC setup with federation we have to specify public IP's. This is a bad tradeoff for intra-region scaling of of nomad clients in that we would have to punch security group rules for each new nomad client added to a DC e.g. we should be using private ips for RPC intra-region.

Proposal: Per discussion on IRC with diptanu add "wan_advertise" for cross region federation and allow "rpc" to be used for intra region rpc.

Related maybe: https://github.com/hashicorp/nomad/issues/304.

evan2645 commented 8 years ago

There are additional considerations for similar cases which exists between datacenters of the same region.

Consul has concept of wan address translation, in that we can use the wan address of a node residing in a remote datacenter. This functionality is also required in Nomad. It seems that the assumption is made that translation is not required between datacenters in a single region, but for our deployment this is not true.

Am I missing something here? Would it be reasonable to push a feature similar to consul's translation feature? I am unsure how that would fit in with a similar translation layer for multi-region support.

jshaw86 commented 8 years ago

@evan2645 i'm confused by your question. The proposal is to add a wan_advertise config setting for cross region gossiping(in AWS public ips would go here). Then for intra-region communication you would use the standard rpc and serf config settings(in AWS you would put private ips here). Would that not suffice your question?

evan2645 commented 8 years ago

@jshaw86 within a single Nomad Region, we have several Nomad Datacenters. Each one of these Nomad Datacenters has its own network, with NAT in between them. For instance, we have a us-west Nomad Region, which includes aws west-1, aws west-2. We want this group of datacenters to be a single scheduling domain, and thus belonging to a single Nomad Region.

While we require the LAN/WAN separation between Nomad Regions (as this PR requests), we also require LAN/WAN separation/translation between Nomad datacenters in a single region. Does that make sense? I can try to sketch something out if not

jshaw86 commented 7 years ago

@dadgar @diptanu did this make it into 0.5.0? This was something we discussed in our on-site and I thought we had agreed on 0.5.0.

dadgar commented 7 years ago

@jshaw86 This did not make it into 0.5, I apologize. This is high on the priority list and we are beginning the design of it and it is part of a larger networking refactor.

jshaw86 commented 7 years ago

@dadgar ok, thx for the update. Is there an ETA on it or soon™. Only ask because our cluster is really difficult to add nodes into and have only a couple people that can do it. So this would allow for easier automation and more democratic cluster administration for us.

dadgar commented 7 years ago

Yeah I can imagine! I don't quite have an ETA since it depends on how the design docs settles. Hard to give an estimate before I know the full scope of what is involved but I believe we are talking about months not weeks for a sense of scale.

jshaw86 commented 7 years ago

@dadgar ok thx!

jshaw86 commented 7 years ago

@dadgar I noticed this still isn't assigned a milestone. Is this being worked on or still back-burnered?

dadgar commented 7 years ago

@jshaw86 Will be tackling in 0.6.0

roman-cnd commented 6 years ago

Same here. The ability to advertise wan addr would greatly simplify networking and config management for single region, multi DCs setups.

ole-lukoe commented 6 years ago

I hope it will be enough to make it possible to set empty string in ServiceAddress field when Nomad registering service in Consul. In this case DNS service name resolution will rely on Consul translation features. The only way I found it is static manual consul services.

pashinin commented 4 years ago

Any update on this? Can configure Consul like this:

advertise_addr_wan="..."
# or to disable:
ports {
    serf_wan = -1
    ...
}

but nothing like this in Nomad

stuartmaxwell commented 3 years ago

I just ran into this issue in my homelab that I'm using to test out Nomad. I have a 3 node Nomad server cluster, with an additional three Nomad clients running a variety of jobs. I then tried to add a new Nomad client that exists in the cloud, but was unable to make this work, since my homelab is behind a NAT router. The docs seem to imply that this is a supported scenario: https://www.nomadproject.io/docs/install/production/requirements#network-topology "This allows having a set of Nomad servers that service clients that can be spread geographically over a continent or even the world in the case of having a single "global" region and many datacenter."

I am able to get the Nomad client to join the cluster initially, but after joining it receives the list of Nomad Server IP addresses which are all internal and not routable from the internet. I have managed to workaround this issue by adding the WAN address of my home network to the advertise stanza as follows:

advertise {
  rpc  = "x.x.x.x"
}

With this in place, the Nomad client in the cloud is able to participate properly in the cluster and can receive jobs. The downside, is that all RPC traffic between clients and servers on the local network is going via my public IP address instead direct to the servers. I have a TCP router and loadbalancer configured in Traefik to direct the RPC traffic on port 4647 to the three Nomad servers

As suggested above, having separate "wan" and "lan" advertise addresses would solve this issue, or perhaps if you allowed using DNS names for the advertised addresses then I could use split-horizon DNS to resolve them differently for different environments? Anyway, hope this feedback helps.

jmwilkinson commented 2 years ago

Given that this request is now well over five years old, is it safe to assume it is not a feature that will be supported within the next few years at a minimum?

elqueffo commented 1 year ago

I just wanted to add that we also would like this feature ASAP. We have multiple cloud providers, and for "local" nomad clients we want to use the internal IP range, while for cross-provider traffic we need to join via the public IPs.

This is partly a cost optimization question (since traffic to the public IP tends to count as egress/ingress) as well as a security consideration.