hashicorp / consul

Consul is a distributed, highly available, and data center aware solution to connect and configure applications across dynamic, distributed infrastructure.
https://www.consul.io
Other
28.25k stars 4.41k forks source link

Consul agent generated the same node ID for couple hosts #9467

Open dreadushka opened 3 years ago

dreadushka commented 3 years ago

Consul agent generated the same node ID for next hosts: MYUGF-DBS302P (created about 1 year ago) MYUGF-DBS311P (created two days ago)

Should i use parameter -disable-host-node-id for all my agents, to prevent situation like this?

pierresouchay commented 3 years ago

Are those bare-metal hosts or VMs? Which operating system?

dreadushka commented 3 years ago

Are those bare-metal hosts or VMs? Which operating system?

VMs. RHEL 7

jsosulska commented 3 years ago

Hello @dreadushka - Apologies for the late reply!

We found this issue and fixed it in this PR and the fix was introduced into Consul v1.7.3. The larger discussion around this issue can be found here. If this is occurring on > v1.7.3, please include the version number of Consul Clients and Servers, as well as your configuration files and commands used to run Consul. -disable-host-node-id can be used to partially mitigate this as well. Please let me know your results with using that flag here. In the meantime, I'll hold this open until 2/22 as "waiting-reply".

Thank you for submitting your issue!

dreadushka commented 3 years ago

Cluster started from 1.4.2. I updated two tomes from 1.4.2 to 1.6.2 and from 1.6.2 to 1.8.4. Version 1.8.4 is on servers and agents now.

Server config:

{
    "advertise_addr": "10.15.29.191",
    "bind_addr": "10.15.29.191",
    "domain": "consul",
    "bootstrap_expect": 3,
    "server": true,
    "datacenter": "K",
    "data_dir": "/var/consul",
    "encrypt": "",
    "dns_config": {
        "allow_stale": true,
        "max_stale": "15s"
    },
    "retry_join": [
        "10.15.29.191",
        "10.15.29.185",
    "10.15.29.186",
    "10.15.29.164",
    "10.15.29.165"
    ],
    "retry_interval": "10s",
    "retry_max": 100,
    "skip_leave_on_interrupt": true,
    "leave_on_terminate": false,
    "ports": {
        "dns": 53,
        "http": 8500
    },
    "recursors": [
                "10.15.40.124",
                "172.24.40.3",
                "172.24.40.4",
                "10.15.40.7",
        "10.15.40.8"
    ],
    "rejoin_after_leave": true,
    "addresses": {
        "http": "0.0.0.0",
        "dns": "0.0.0.0"
    }
}

Server runs using systemd:

[Service]
EnvironmentFile=-/etc/sysconfig/consul
Environment=GOMAXPROCS=2
ExecStart=/usr/local/bin/consul agent -config-dir=/etc/consul.d/server -rejoin -ui -data-dir=/var/consul

Client config:

{
    "bind_addr": "10.206.159.155",
    "domain": "consul",
    "datacenter": "k",
    "data_dir": "/var/consul",
    "encrypt": "",
    "retry_join": [
        "10.15.29.191",
        "10.15.29.185",
    "10.15.29.186",
    "10.15.29.164",
    "10.15.29.165"
    ],
    "retry_interval": "10s",
    "retry_max": 100,
    "skip_leave_on_interrupt": true,
    "leave_on_terminate": false,
    "ports": {
        "dns": 53,
        "http": 8500
    },
    "recursors": [
        "10.15.40.124",
        "172.24.40.3",
        "172.24.40.4",
        "10.15.40.7",
        "10.15.40.8"
    ],
    "rejoin_after_leave": true,
    "addresses": {
        "http": "0.0.0.0",
        "dns": "0.0.0.0"
    }
}

Client's part of systemd config:

[Service]
EnvironmentFile=-/etc/sysconfig/consul
Environment=GOMAXPROCS=2
ExecStart=/usr/local/bin/consul agent -config-dir=/etc/consul.d/agent -data-dir=/var/consul
ilhaan commented 3 years ago

I'm seeing the same happen with consul 1.7.13 clients

ilhaan commented 3 years ago

I had to set --disable-host-node-id=true to get past this. PR to allow setting this via values.yaml here