nellshamrell commented 5 years ago

Redesign Sys IP Determination and Reporting

This design doc was inspired by the investigation and discussion on this issue.

We currently have a bug where a service will occasionally start reporting it's sys.ip as 127.0.0.1. There may be a few causes, but one we have confirmed is that a service will fall back to reporting 127.0.0.1 as it's ip when Google's DNS (8.8.8.8) is not routable.

This is due to these two sections of the code:

https://github.com/habitat-sh/habitat/blob/4527c32e25d60d77ece99cb7ac8cd1182cb85279/components/core/src/util/sys.rs#L9-L16

This code will use the kernel's routing tables to return the IP address of the interface that can route to '8.8.8.8'

(this is a technique that is detailed in Section 8.14, "Determining Outgoing Interface with UDP", of UNIX Network Programming, Volume 1, 3rd Edition by Stevens et al., the relevant text to which is available here.)

If 8.8.8.8. is not routable, the function will return an error, and we will recover from it later in the code.

https://github.com/habitat-sh/habitat/blob/4527c32e25d60d77ece99cb7ac8cd1182cb85279/components/sup/src/manager/sys.rs#L38-L45

This IP address appears to only be used in templating contexts - through the top-level sys.ip key, as well as the sys.ip key of any svc_member structs (detailed at https://www.habitat.sh/docs/reference/#template-data).

(Thank you to @christophermaier for the initial code tracing and analysis!).

I tried a few approaches to returning the IP address of a host through Rust.

The most viable approach was to use the pnet crate to return the ip address of all network interfaces, though we would need to do some filtering of the network interfaces to determine which IP address we should return, likely similar to the way Ohai does this.

However, a couple of people also pointed out that there may be situations where a service instance will not have an externally resolvable IP address. This happens when a service cluster is running on an internal network - with VMs or containers.

@sirajrauff shared this real world example

Vagrant + Virtualbox multi-machine environment using NAT with the host for internet access, and a private network to communicate with other VMs. In this case the first network adapter (NAT) is what the Supervisor will resolve as sys.ip. This IP address, being proxied, does not allow ingress and in fact is the same IP address across all the machines. This breaks bindings in this scenario, unless I modify the package to use sys.gossip_ip which will resolve to the IP associated with the second network adapter attached to the private network.

In order to account for situations where Google DNS is not routable and/or the service instance has no IP address, we need a way for a user to specify which network interface they want to source their IP from. There should also be a sane default if the user does not specify a network interface - for example, the default route or the route that gets us to the gateway for a default route (taking a similar approach to Ohai's approach.

Motivation

As a Habitat user I want to be able to specify which network interface my service should use to determine its exposed IP address So that I can rely on the IP address to be accurate Including when a service cluster is running on an internal network Additionally, if I do not specify a network interface Habitat will select a sane default network interface For my service to use to determine its exposed IP address

Specification

When a user runs the Habitat supervisor with

hab sup run

They will be able to pass an additional option to select a specific network interface for the Supervisor to use to draw its IP from.

hab sup run --network-interface "eth0"

In this case, the sys.ip for any service running under this Supervisor will be the IP for eth0.

If that interface cannot be found, the supervisor will fail with an error (rather than default to 127.0.0.1).

If they do not pass that option when running that command, a default network will be selected to determine the IP address displays as sys.ip. (Perhaps the first interface with external internet access?).

If no interfaces can be found, the supervisor will return an error (rather than default to 127.0.0.1)

Downstream Impact

Some users may see their service's IP address change when they update Habitat to a new version with this functionality in it.

nellshamrell commented 5 years ago

This is now ready for review!

christophermaier commented 5 years ago

@nellshamrell the IP address is actually that of the Supervisor itself, so it's not anything that package authors can influence. Instead, I think we'd need to have some kind of CLI option to allow a user to (optionally) specify an interface to use for the Supervisor's communication.

There is probably some unfortunate conflation of system and service information in the code, which just ends up confusing things :frowning_face:

If there's a desire to specify which interfaces individual services can use for their communication, that would likely be a separate issue.

sirajrauff commented 5 years ago

When a user creates a plan, they have the option to set a $pkg_network_interface variable.

I think this may not be the way to go, mirroring @christophermaier's thoughts. This couples machine-specific configuration with Habitat Packages, and I think would overall make it less portable. At the very least I think having this at the build-time/plan level may not be ideal.

A quick thought on this would draw on my example again, as I was testing a package that would require sys.ip to resolve to the Vagrant (Virtualbox) VM's second network interface to allow other VMs to communicate with it. However, this package was eventually deployed on an EC2 instance on AWS which had a single network interface.

At the very least this introduces another scenario:

$pkg_network_interface is set and the network interface does not exist.

raskchanky commented 5 years ago

Quoting @christophermaier :

Instead, I think we'd need to have some kind of CLI option to allow a user to (optionally) specify an interface to use for the Supervisor's communication.

I think this is the way to go.

nellshamrell commented 5 years ago

Good points, all, reworking it to be a CLI option to use for the Supervisor's communication.

nellshamrell commented 5 years ago

This has been updated and is ready for review again!

christophermaier commented 5 years ago

I see that the pnet crate is cross platform, which is great. It would be nice to explicitly call out that this solution should work for Windows as well (using whatever naming conventions are standard on Windows for network interfaces).

A few testing scenarios that prove the functionality on Linux and Windows will be great to have.

Aside from that, this is looking pretty nice. I like the idea of failing if we can't come up with a sensible address, as opposed to falling back to 127.0.0.1 :+1:

sirajrauff commented 5 years ago

If that interface cannot be found, the supervisor will fail with an error (rather than default to 127.0.0.1).

👍

If they do not pass that option when running that command, a default network will be selected to determine the IP address displays as sys.ip. (Perhaps the first interface with external internet access?).

Last thought I have left is if it might make more sense to default to sys.gossip_ip if it is distinct from the first interface with external access? This IP address is already one that is known to allow communication within the "network", with the proof of this being the successful peering that would have resulted in the IP to begin with. This also then relies on another user configured variable, the --peer flag which might be more indicative of the network setup than just external access.

I might be overthinking this, and this would also result in a much greater downstream impact. Perhaps someone with more of a background on the use cases for gossip_ip can speak up here - I honestly am not sure why it exists considering you can setup the Supervisor to listen on every interface and have it resolve the IP on an incoming request.

christophermaier commented 5 years ago

@sirajrauff sys.gossip_ip is the listen address for gossip, and actually defaults to 0.0.0.0, which would unfortunately be just as bad a default as 127.0.0.1 for sys.ip.

It's a little difficult to trace in the code, but I think that sys.ip is only really used to expose the IP address of the Supervisor to other services through the template rendering system. (thinking further, I'm not 100% sure what benefit there is to exposing what address a remote Supervisor is listening on to services running on another Supervisor, so I'm not clear on why sys.gossip_ip is included in template rendering... perhaps this is fodder for some future cleanups I alluded to in https://github.com/habitat-sh/habitat/pull/4823)

If you're able to confirm or deny any of this while you're in this code, @nellshamrell, this distinction would be nice to add to the relevant documentation (both user-facing and developer-facing :sweat_smile: )

As the distinction and use of these addresses is refined, a potential future enhancement might be to allow an operator to specify which interface / IP address a given service should advertise as its own (e.g., you want to get gossip on one interface, and service traffic on another). This would be something specified at hab svc load time, and not something baked into a package. That would be a more invasive change, though, and would definitely need to be driven by consumer desire and need. It's also definitely not something for this current work.

baumanj commented 5 years ago

Another thing to consider is that even for a given network interface, there can be multiple IP addresses.

Something that's not entirely clear to me from reading the design (or the docs) is the intended use of sys.ip. I wonder if it's being used to implement functionality that would be more appropriately handled through service discovery mechanisms. The idea that packages need to be concerned with the details like network interface names (which will almost certainly change across development/deployment platforms) seems contrary to the habitat ethos.

Additionally, there's the question of whether this information belongs in the sys namespace, as opposed to pkg, cfg or svc. But I think that depends a lot on how the data is intended to be used.

nellshamrell commented 5 years ago

Thank you for the comments, all!

@baumanj - while I had originally thought that the interface should be defined on the package level, it makes much more sense to define it on the supervisor level for the reasons you mention - network interface names will almost certainly change across development/deployment platforms. I updated the design doc with this in mind.

I am also unsure of the intended use of sys.ip, but given the headaches that users have expressed in #5079, it certainly seems like they are depending on it. I'm happy to rethink this on a larger level, but am current hesitant at the idea of changing it drastically and making these headaches far worse.

You are also right that a given network interface may have multiple IP addresses. Not sure how to incorporate that in, but I'm thinking on it (and welcome suggestions :) )

baumanj commented 5 years ago

I am also unsure of the intended use of sys.ip, but given the headaches that users have expressed in #5079, it certainly seems like they are depending on it. I'm happy to rethink this on a larger level, but am current hesitant at the idea of changing it drastically and making these headaches far worse.

I think this indicates a need to understand how sys.ip is being used and why before moving forward with changes. It seems possible that we can provide a more appropriate and maintainable mechanism to achieve the desired goal. Once we know what we do want to provide and why, then I think we can decide on the how.

sirajrauff commented 5 years ago

@christophermaier so I may not have phrased it correctly, but I was referring to the resolved value of gossip_ip as opposed to the "listener" value, i.e. using a sample-node-app configured with binds, I installed the Supervisor on CentOS 7 using hab 0.83:

machine-1 # journalctl -fu hab-sup -n 50
...
hab[4843]: hab-sup(MR): Supervisor Member-ID bc7b935d0f694e3b9d7a57c574275cdf
hab[4843]: hab-sup(MR): Starting gossip-listener on 0.0.0.0:9638
hab[4843]: hab-sup(MR): Starting ctl-gateway on 127.0.0.1:9632
hab[4843]: hab-sup(MR): Starting http-gateway on 0.0.0.0:9631

After loading sample-node-app on machine 1, I can query the API and get the following:

machine-1 # curl --silent localhost:9631/butterfly | jq ".services[\"sample-node-app.default\"][\"$(cat /hab/sup/default/MEMBER_ID)\"].sys"
{
  "ip": "10.0.2.15",
  "hostname": "localhost.localdomain",
  "gossip_ip": "0.0.0.0",
  "gossip_port": 9638,
  "http_gateway_ip": "0.0.0.0",
  "http_gateway_port": 9631,
  "ctl_gateway_ip": "127.0.0.1",
  "ctl_gateway_port": 9632
}

Note this is in the Vagrant multi-machine example I have given before, and after loading my second Supervisor configured as follows:

ExecStart=/bin/hab sup run --peer 192.168.1.131

I then get the following (note gossip_ip has changed!)

machine-1 # curl --silent localhost:9631/census | jq -r '.census_groups["sample-node-app.default"].population.bc7b935d0f694e3b9d7a57c574275cdf.sys'
{
  "ip": "10.0.2.15",
  "hostname": "localhost.localdomain",
  "gossip_ip": "192.168.1.131",
  "gossip_port": 9638,
  "http_gateway_ip": "0.0.0.0",
  "http_gateway_port": 9631,
  "ctl_gateway_ip": "127.0.0.1",
  "ctl_gateway_port": 9632
}

Network interfaces on machine 1:

# ip addr show
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    inet 127.0.0.1/8 scope host lo
    ...
2: enp0s3: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
    inet 10.0.2.15/24 brd 10.0.2.255 scope global noprefixroute dynamic enp0s3
    ...
3: enp0s8: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
    inet 192.168.1.131/24 brd 192.168.1.255 scope global noprefixroute dynamic enp0s8
    ...
4: docker0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state DOWN group default
    inet 172.17.0.1/16 brd 172.17.255.255 scope global docker0

What I find odd here is that if I then use an haproxy package with the configuration set to bind to sys.ip, it fails (naturally) resolving to 10.0.2.15, but using sys.gossip_ip it works just fine as it resolves to 192.168.1.131. Not even sure why gossip_ip would need to be in the template data, but it's saved my skin in cases like this where the issues that cause sys.ip to eventually resolve to the local loopback don't affect gossip_ip.

So what I meant by my original comment above is that gossip_ip actually performs how I would have thought sys.ip should have, namely that it reports back the IP address most likely to be able to reach other Supervisors/services running on them.

sirajrauff commented 5 years ago

A quick secondary thought (i'm so sorry for these walls of text); the use case I detailed above uses bindings and concerns network traffic between machines, but I've also seen cases where sys.ip is used to set a listening address for a server, in which case this may be an entirely different case as now supervisor reachability within the ring is not an issue, but being able to accept traffic is important.

raskchanky commented 5 years ago

@sirajrauff No need for apologies here. The information you've provided so far is very helpful.

nellshamrell commented 5 years ago

Briefly discussed this in our morning stand up - here's a summary:

We don't know the original intent of sys.ip and sys.gossip_ip - it's not documented and the people who would know have left the project.

We do have some insight into how our users are using the two values (as @sirajrauff mentions in his examples in previous comments).

I believe we are at the point of needing to decide how users should use the two values - it's only going to get more confusing if we don't have a documented intent/convention. In order to do that, I believe we're at the point of needing help from our product team in defining this intent/convention before we can move to implementation. Sending this link to our product team today.

christophermaier commented 5 years ago

My impression (which may be revised with further research!) is that sys.ip is intended to be the IP address at which a given Supervisor and its services are reachable. That is, if you know that Redis is running on port 1234, and its reported sys.ip address is 192.168.5.6, then you should be able to interact with it at 192.168.5.6:1234.

sys.ip is resolved as detailed above when the Supervisor starts up, and the same value is given out to via gossip for all services that are running on the Supervisor.

This falls down with the current implementation because Google's DNS isn't routable in all environments, leading to a fallback value of 127.0.0.1, which doesn't work. There's an additional wrinkle if there are multiple interfaces available on the machine, which is not accounted for anywhere in the current implementation.

I'm really not clear why sys.gossip_ip is useful to expose to users in the form of templating data (despite the fortunate nature of that decision for @sirajrauff, detailed above). However, the templating data was initially essentially a JSON serialization of internal Supervisor data structures, so I'm not at all surprised that there are less-than-useful data in there.

That's my brain dump for now; I'll add more as I think of things :smile:

sirajrauff commented 5 years ago

Just want to double back here on this comment where I referenced the core-plans Jenkins package:

I've also seen cases where sys.ip is used to set a listening address for a server

I modified the plan to listen on 0.0.0.0 and it ended up listening on every network interface as I suspected it would. This is then more a possible improvement for the core-plan than it is a requirement for sys.ip, and can be excluded as a use-case.

nellshamrell commented 5 years ago

Re: network interfaces having >1 IP address, we could also allow users to specify an IP address when they specify a network interface. If one is not specified, we can select a default (and return an error if no IP addresses are available for the selected network interface).

fnichol commented 5 years ago

The sys.ip was a refactor/introduced in 5c2f3237bb1b1341585e491fb623a6fb763ec0e8 (discussed at length at the time and refactored by @adamhjk) to set up some namespaced configuration, each of which had a function and mutability (cfg: mutable user config read from TOML files, pkg: immutable data from the service package, svc: mutable service config, read from gossip, sys: immutable "system" info, provided from the host, bldr: immutable supervisor info such as the version of bldr/Habitat Supervisor). This was long before running multiple services on one Supervisor was considered, so originally sys.ip would have been used to bind 1) the singular service to run (such as redis), 2) the http gateway service (originally called the "sidecar" in this code), and 3) the gossip service.

The sys.gossip_ip (and sys.sidecar_ip) were introduced in habitat-sh/habitat#552 (added a few months later thanks to @metadave) to allow for more flexible deployments of the Supervisor by binding the gossip and http-gateway services to potentially different network interfaces/networks/IP addresses. This meant that we doubled down on the sys.ip as being used as the service's IP address. Typically when computing a bind to another service, a package's config templating might use a member's sys.ip and the service's port together. Now, some services might not use the sys.ip which muddied the waters (you can see that the Redis package in 5c2f3237bb1b1341585e491fb623a6fb763ec0e8 ignores sys.ip in the config template and uses config instead).

There were several strategies for determining sys.ip over the years, with the current one using a UDP datagram to a (hopefully) routable IP which @metadave and I paired on in mid-2016. Dave called out some of the details in https://github.com/habitat-sh/habitat/pull/1143#issuecomment-239502363 to add more color which @nellshamrell mentions at the top.

Even at the time we suspected that this would be an 80% solution/get it working approach because, as folks on this thread have discovered, you don't always have a route to the public internet. Personally, I thought that a key to getting this sys.ip right and fully customizable would be to look at what Ohai from the Chef project did and think hard about how to mimic the solution to node['ipaddress'] in a chef-client run. The challenge is when, as a Chef user, I thought about how often node['ipaddress'] was what I actually wanted vs. depending on a cloud provider's Ohai hints such as node['ec2']['public_ipv4']. In short, that single bit of data often made Chef cookbooks that much more complex (note: this may or may not still be the case in cookbooks, think Chef circa 2015/2016). Also in the modern Supervisor, there are multiple services to consider, each with their own potential "sys.ip" requirement (i.e. maybe the singular value is not enough, or needs to be a list, or needs to be configurable/selectable per-service?). In the interest of shipping our idea of Habitat to world in ~2016, we shipped the 80% solution and moved on to the next challenge. This is one of those solution-pending items that needs a resolution.

Hope this helps, or at least provides folks with a bit of project archeology along the way. Thanks to @eeyun for pointing me this way and letting me walk down memory lane 🥃

nellshamrell commented 5 years ago

Hello all!

Had a great zoom meeting about this tomorrow. Course of action we agreed on for now is:

1) Allow a user to specify an IP address when they start the supervisor (using a command flag - to be discussed with UX) 2) If a user does not specify an IP address, continue using our Google DNS trick. If Google DNS is not routable, return an error to the user with some remediation suggestions (rather than defaulting to 127.0.0.1)

raskchanky commented 5 years ago

Closed by https://github.com/habitat-sh/habitat/pull/6960

habitat-sh / habitat

Design Doc to redesign sys IP determination and promotion #6793

Redesign Sys IP Determination and Reporting

Motivation

Specification

Downstream Impact