Consul join broken in containerized Servers run on same node

iamlittle commented 7 years ago

`consul version` for both Client and Server

Server: Consul v0.8.0

`consul info` for both Client and Server

Server:

agent:
        check_monitors = 0
        check_ttls = 0
        checks = 0
        services = 1
build:
        prerelease =
        revision = '402636f
        version = 0.8.0
consul:
        bootstrap = false
        known_datacenters = 1
        leader = false
        leader_addr =
        server = true
raft:
        applied_index = 0
        commit_index = 0
        fsm_pending = 0
        last_contact = never
        last_log_index = 0
        last_log_term = 0
        last_snapshot_index = 0
        last_snapshot_term = 0
        latest_configuration = []
        latest_configuration_index = 0
        num_peers = 0
        protocol_version = 2
        protocol_version_max = 3
        protocol_version_min = 0
        snapshot_version_max = 1
        snapshot_version_min = 0
        state = Follower
        term = 0
runtime:
        arch = amd64
        cpu_count = 4
        goroutines = 107
        max_procs = 4
        os = linux
        version = go1.8
serf_lan:
        encrypted = false
        event_queue = 0
        event_time = 1
        failed = 0
        health_score = 0
        intent_queue = 0
        left = 0
        member_time = 1
        members = 1
        query_queue = 0
        query_time = 1
serf_wan:
        encrypted = false
        event_queue = 0
        event_time = 1
        failed = 0
        health_score = 0
        intent_queue = 0
        left = 0
        member_time = 1
        members = 1
        query_queue = 0
        query_time = 1

Operating system and Environment details

Ubuntu 16.04.1 LTS Kubernetes 1.5

Description of the Issue (and unexpected/desired result)

Trying to join containerized consul servers on the same machine will throw an error due to /proc/sys/kernel/random/boot_id being identical across all containers on a host.

Reproduction steps

Running Consul 0.8.0 in a 3 pod replica set on a single node Kubernetes cluster (development machine). Deployment definition

consul agent join X.X.X.X throws the error:

Error joining address 'X.X.X.X': Unexpected response code: 500 (1 error(s) occurred:

* Failed to join X.X.X.X: Member 'consul-deployment-2927201250-tlj1h' has conflicting node ID 'bd203bf1-2d59-42b0-901c-2a5dedf99c86' with this agent's ID)
Failed to join any nodes.

I believe this to be a result of #2700. In any case, 0.8.0 could cause some serious problems in Kubernetes clusters if the 2 Consul pods were to be scheduled on the same machine. This may not occur immediately.

sean- commented 7 years ago

@iamlittle If you're concerned about this happening, then you need to pass in -node-id to set each instance of Consul to have a unique node ID. I'd argue (strongly) that this is in fact the correct behavior and that if you want to allow duplicate consul nodes on the same host you need to explicitly disambiguate them. Consul doesn't detect anything about running inside of a Kube environment so if there is a better ID to pull from if Consul is running under Kube, lmk.

iamlittle commented 7 years ago

@sean- Thanks! I was looking for something like that in the docs. Guess I missed it.

slackpad commented 7 years ago

We will add a note to the docs and maybe even the error message to help people find the -node-id option.

Something like -node-id=$(uuidgen | awk '{print tolower($0)}') added to the command line should get you a unique node ID.

mgiaccone commented 7 years ago

@slackpad That sounds good, but I believe an option in consul to force the generation of the node id from another source would be very useful. I'm facing the same issue as @iamlittle. Your solution sounds reasonable, but it requires uuidgen to be available in the container. This is not the case when using the official consul docker images, for instance.

iamlittle commented 7 years ago

@mgiaccone cat /proc/sys/kernel/random/uuid will give you a uuid and is available in the docker container.

mgiaccone commented 7 years ago

@iamlittle Thanks, I just solved it with the same command

slackpad commented 7 years ago

@mgiaccone that's fair - depending on how many people bump into this we may need to add an option to generate a uuid internally - we've got the code in there, it's just a tradeoff on adding more config complexity.

rgruyters commented 7 years ago

Is it me or does the -node-id=$(cat /proc/sys/kernel/random/uuid) not work yet in version 0.8.0? Tried the following commands:

docker run -d --name consul-01 -e 'CONSUL_LOCAL_CONFIG={"skip_leave_on_interrupt": true}' consul agent -server -bind=0.0.0.0 -client=0.0.0.0 -retry-join=172.17.0.2 agent ip -bootstrap-expect=3 -node-id=$(cat /proc/sys/kernel/random/uuid)
docker run -d --name consul-02 -e 'CONSUL_LOCAL_CONFIG={"skip_leave_on_interrupt": true}' consul agent -server -bind=0.0.0.0 -client=0.0.0.0 -retry-join=172.17.0.2 agent ip -bootstrap-expect=3 -node-id=$(cat /proc/sys/kernel/random/uuid)
docker run -d --name consul-03 -e 'CONSUL_LOCAL_CONFIG={"skip_leave_on_interrupt": true}' consul agent -server -bind=0.0.0.0 -client=0.0.0.0 -retry-join=172.17.0.2 agent ip -bootstrap-expect=3 -node-id=$(cat /proc/sys/kernel/random/uuid)

This doesn't work for me.

[update] Sorry, my bad. I still had the words "agent ip" in the command-line. [/update]

shantanugadgil commented 7 years ago

@slackpad After having upgraded from 0.7.5 to 0.8.0, I have hit a bug of identical node ids. My use case is that I have LXD (LXC) containers and the dmidecode output from inside all the LXD containers is the same as that of the physical host.

These are long running LXD containers which can stop and start over time.

If I were to pass the the -node-id parameter to the Consul startup, the node-id would be different on each startup. In such a case, would it matter? Or, would it use the saved (persisted) node-id from the previous ?

For now, I have reverted to v0.7.5

Thanks and Regards, Shantanu

shantanugadgil commented 7 years ago

@slackpad

... answering my own question ... 😄

As expected the changing node-id (as specified on the command-line) does matter, but only until the health checks pass and the nodes de-register and re-register successfully.

For testing, if I restart the nodes (lxc containers) within a short span of time, I do see the message: "consul.fsm: EnsureRegistration failed: failed inserting node: node ID..." and then ... "member XXXXX left, deregistering"

The node joins in successfully after the health checks, so for me, things are working fine with v0.8.0 for now.

Regards, Shantanu

mterron commented 7 years ago

A "better" IMO way to set the node-id is with something like this:

cat /proc/sys/kernel/random/uuid > "$CONSUL_DATA_DIR"/node-id

and then start your consul agent/server as per usual (pre 0.8) practice. Reading the code you can see that consul first checks for the existence of this file before trying to generate a new node-id. That way, if you restart your container, it will keep a stable node-id.

shantanugadgil commented 7 years ago

@mterron thanks! I have Ubuntu 14.04/16.04 LXD (LXC) containers.

I will have to come up with a startup logic of "execute only once, if node-id file doesn't exist" in the init script and the systemctl equivalent, so that the node-id file get generated only once!

It's straightforward for the 14.04 upstart script, will check up on how to easily achieve for the systemctl equivalent 😦

Thanks and Regards, Shantanu

slackpad commented 7 years ago

Changing this to enhancement - I think we should add a configuration to disable the host-based ID, which will make a random one if needed inside of Consul itself, and then save that to the data dir for persistence. This will make life easier for people trying to do this in Docker.

shantanugadgil commented 7 years ago

Thanks @slackpad Eagerly awaiting the next release! 🙌

mterron commented 7 years ago

What's the scenario where you want consul to use the boot_id as node id? Generating a random node id by default seems more intuitive but I'm sure I'm missing something here.

I mean, instead of having the -disable-host-node-id flag, I'd just add a -enable-host-node-id for the people that specifically need that behaviour.

slackpad commented 7 years ago

@mterron Nomad uses the same host-based IDs so it's nice to have the two sync by default (you can see where a job is running and go to .node.consul via Consul DNS kind of thing). It makes for some cool magic integration for applications like that, and in Consul you really don't want to be running two agents in the same cluster on the same host (unless you are testing or experimenting) so we made it opt-out for now.

mterron commented 7 years ago

I've never used Nomad so boot_id seemed like an arbitrary choice for a random identifier but it sort of makes sense from a Hashicorp ecosystem point of view.

2 lines on the documentation should be enough to explain the default behaviour so that users are not surprised. Something like: "By default Consul will use the machine boot_id (/proc/sys/kernel/random/boot_id) as the node-id. You can override this behaviour with the -disable-host-node-id flag or pass your own node-id using the -node-id flag." or something like that.

Thanks for replying to a closed issue!

slackpad commented 7 years ago

Hi @mterron we ended up adding something like that to the docs - https://www.consul.io/docs/agent/options.html#_node_id:

-node-id - Available in Consul 0.7.3 and later, this is a unique identifier for this node across all time, even if the name of the node or address changes. This must be in the form of a hex string, 36 characters long, such as adf4238a-882b-9ddc-4a9d-5b6758e4159e. If this isn't supplied, which is the most common case, then the agent will generate an identifier at startup and persist it in the data directory so that it will remain the same across agent restarts. Information from the host will be used to generate a deterministic node ID if possible, unless -disable-host-node-id is set to true.

hashicorp / consul