hashicorp / nomad

Nomad is an easy-to-use, flexible, and performant workload orchestrator that can deploy a mix of microservice, batch, containerized, and non-containerized applications. Nomad is easy to operate and scale and has native Consul and Vault integrations.
https://www.nomadproject.io/
Other
14.91k stars 1.95k forks source link

Nomad binds to unexpected interface #10179

Open sebastiankr opened 3 years ago

sebastiankr commented 3 years ago

Hi there,

after upgrading nomad to 1.0.4, starting nomad via systemd fails to elect leader. Starting via nomad agent -config /etc/nomad.d continues to work fine. Stopping and starting after a reboot via systemctl stop nomad && systemctl start nomad works as well. Why is the leader election just failing after system restart?

I am running nomad in single server mode. Nomad was installed via apt.

Nomad version

Nomad v1.0.4 (9294f35f9aa8dbb4acb6e85fa88e3e2534a3e41a)

nomad.hcl

root@server-0:~# cat /etc/nomad.d/nomad.hcl            
datacenter = "mode"                                    
data_dir = "/opt/nomad/data"                           
bind_addr = "0.0.0.0"

server {                                               
  enabled = true                                       
  bootstrap_expect = 1                                 
}                                                      

client {     
  enabled = true                                       
  servers = ["127.0.0.1:4646"]                         
  network_interface = "ens10"                          

  template {                                           
    disable_file_sandbox = true                        
  }                                                    

  host_volume "rabbitmq" {                             
    path      = "/opt/rabbitmq"                        
    read_only = false                                  
  }                                                    

  host_volume "arangodb" {                             
    path      = "/opt/arangodb"                        
    read_only = false                                  
  }                                                                             
  options {                                            
    "docker.auth.config" = "/root/.docker/config.json" 
  }                                                                                        
}

acl {                                                  
  enabled = true                                       
} 

nomad.service

root@server-0:/etc/systemd/system/multi-user.target.wants# cat nomad.service     
[Unit]                                                                           
Description=Nomad                                                                
Documentation=https://nomadproject.io/docs/                                      
Wants=network-online.target                                                      
After=network-online.target                                                      

# When using Nomad with Consul it is not necessary to start Consul first. These  
# lines start Consul before Nomad as an optimization to avoid Nomad logging      
# that Consul is unavailable at startup.                                         
#Wants=consul.service                                                            
#After=consul.service                                                            

[Service]                                                                        
ExecReload=/bin/kill -HUP $MAINPID                                               
ExecStart=/usr/bin/nomad agent -config /etc/nomad.d                              
KillMode=process                                                                 
KillSignal=SIGINT                                                                
LimitNOFILE=65536                                                                
LimitNPROC=infinity                                                              
Restart=on-failure                                                               
RestartSec=2                                                                     
StartLimitBurst=3                                                                
StartLimitInterval=10                                                            
TasksMax=infinity                                                                
OOMScoreAdjust=-1000                                                             

[Install]                                                                        
WantedBy=multi-user.target                                                       

Operating system and Environment details

Ubuntu 20.04.2 LTS (GNU/Linux 5.4.0-66-generic x86_64)

Logs after reboot

root@server-0:/etc/systemd/system/multi-user.target.wants# journalctl -xef -u nomad
Mar 13 18:24:26 server-0 nomad[526]: ==> WARNING: Bootstrap mode enabled! Potentially unsafe operation.
Mar 13 18:24:26 server-0 nomad[526]: ==> Loaded configuration from /etc/nomad.d/nomad.hcl
Mar 13 18:24:26 server-0 nomad[526]: ==> Starting Nomad agent...
Mar 13 18:24:28 server-0 nomad[526]: ==> Nomad agent configuration:
Mar 13 18:24:28 server-0 nomad[526]:        Advertise Addrs: HTTP: 10.0.1.1:4646; RPC: 10.0.1.1:4647; Serf: 10.0.1.1:4648
Mar 13 18:24:28 server-0 nomad[526]:             Bind Addrs: HTTP: 0.0.0.0:4646; RPC: 0.0.0.0:4647; Serf: 0.0.0.0:4648
Mar 13 18:24:28 server-0 nomad[526]:                 Client: true
Mar 13 18:24:28 server-0 nomad[526]:              Log Level: INFO
Mar 13 18:24:28 server-0 nomad[526]:                 Region: global (DC: mode)
Mar 13 18:24:28 server-0 nomad[526]:                 Server: true
Mar 13 18:24:28 server-0 nomad[526]:                Version: 1.0.4
Mar 13 18:24:28 server-0 nomad[526]: ==> Nomad agent started! Log data will stream in below:
Mar 13 18:24:28 server-0 nomad[526]:     2021-03-13T18:24:26.356+0100 [WARN]  agent.plugin_loader: skipping external plugins since plugin_dir doesn't exist: plugin_dir=/opt/nomad/data/plugins
Mar 13 18:24:28 server-0 nomad[526]:     2021-03-13T18:24:26.364+0100 [INFO]  agent: detected plugin: name=docker type=driver plugin_version=0.1.0
Mar 13 18:24:28 server-0 nomad[526]:     2021-03-13T18:24:26.364+0100 [INFO]  agent: detected plugin: name=raw_exec type=driver plugin_version=0.1.0
Mar 13 18:24:28 server-0 nomad[526]:     2021-03-13T18:24:26.364+0100 [INFO]  agent: detected plugin: name=exec type=driver plugin_version=0.1.0
Mar 13 18:24:28 server-0 nomad[526]:     2021-03-13T18:24:26.364+0100 [INFO]  agent: detected plugin: name=qemu type=driver plugin_version=0.1.0
Mar 13 18:24:28 server-0 nomad[526]:     2021-03-13T18:24:26.364+0100 [INFO]  agent: detected plugin: name=java type=driver plugin_version=0.1.0
Mar 13 18:24:28 server-0 nomad[526]:     2021-03-13T18:24:26.364+0100 [INFO]  agent: detected plugin: name=nvidia-gpu type=device plugin_version=0.1.0
Mar 13 18:24:28 server-0 nomad[526]:     2021-03-13T18:24:26.399+0100 [INFO]  nomad.raft: restored from snapshot: id=5-139292-1615545712266
Mar 13 18:24:28 server-0 nomad[526]:     2021-03-13T18:24:26.521+0100 [INFO]  nomad.raft: initial configuration: index=1 servers="[{Suffrage:Voter ID:172.17.0.1:4647 Address:172.17.0.1:4647}]"
Mar 13 18:24:28 server-0 nomad[526]:     2021-03-13T18:24:26.521+0100 [INFO]  nomad.raft: entering follower state: follower="Node at 10.0.1.1:4647 [Follower]" leader=
Mar 13 18:24:28 server-0 nomad[526]:     2021-03-13T18:24:26.523+0100 [INFO]  nomad: serf: EventMemberJoin: server-0.global 10.0.1.1
Mar 13 18:24:28 server-0 nomad[526]:     2021-03-13T18:24:26.523+0100 [INFO]  nomad: starting scheduling worker(s): num_workers=2 schedulers=[service, batch, system, _core]
Mar 13 18:24:28 server-0 nomad[526]:     2021-03-13T18:24:26.523+0100 [WARN]  nomad: serf: Failed to re-join any previously known node
Mar 13 18:24:28 server-0 nomad[526]:     2021-03-13T18:24:26.524+0100 [INFO]  client: using state directory: state_dir=/opt/nomad/data/client
Mar 13 18:24:28 server-0 nomad[526]:     2021-03-13T18:24:26.524+0100 [INFO]  nomad: adding server: server="server-0.global (Addr: 10.0.1.1:4647) (DC: mode)"
Mar 13 18:24:28 server-0 nomad[526]:     2021-03-13T18:24:26.525+0100 [INFO]  client: using alloc directory: alloc_dir=/opt/nomad/data/alloc
Mar 13 18:24:28 server-0 nomad[526]:     2021-03-13T18:24:26.537+0100 [INFO]  client.fingerprint_mgr.cgroup: cgroups are available
Mar 13 18:24:28 server-0 nomad[526]:     2021-03-13T18:24:26.547+0100 [WARN]  client.fingerprint_mgr.network: unable to parse speed: path=/usr/sbin/ethtool device=ens10
Mar 13 18:24:28 server-0 nomad[526]:     2021-03-13T18:24:26.548+0100 [WARN]  client.fingerprint_mgr.network: unable to parse speed: path=/usr/sbin/ethtool device=lo
Mar 13 18:24:28 server-0 nomad[526]:     2021-03-13T18:24:26.550+0100 [WARN]  client.fingerprint_mgr.network: unable to parse speed: path=/usr/sbin/ethtool device=eth0
Mar 13 18:24:28 server-0 nomad[526]:     2021-03-13T18:24:26.554+0100 [WARN]  client.fingerprint_mgr.network: unable to parse speed: path=/usr/sbin/ethtool device=ens10
Mar 13 18:24:28 server-0 nomad[526]:     2021-03-13T18:24:26.570+0100 [INFO]  client.plugin: starting plugin manager: plugin-type=csi
Mar 13 18:24:28 server-0 nomad[526]:     2021-03-13T18:24:26.571+0100 [INFO]  client.plugin: starting plugin manager: plugin-type=driver
Mar 13 18:24:28 server-0 nomad[526]:     2021-03-13T18:24:26.572+0100 [INFO]  client.plugin: starting plugin manager: plugin-type=device
Mar 13 18:24:28 server-0 nomad[526]:     2021-03-13T18:24:27.624+0100 [WARN]  nomad.raft: not part of stable configuration, aborting election
Mar 13 18:24:28 server-0 nomad[526]:     2021-03-13T18:24:28.367+0100 [ERROR] client.rpc: error performing RPC to server: error="rpc error: failed to get conn: dial tcp 127.0.0.1:4646: connect: connection refused" rpc=Node.Register server=127.0.0.1:4646
Mar 13 18:24:28 server-0 nomad[526]:     2021-03-13T18:24:28.367+0100 [ERROR] client.rpc: error performing RPC to server which is not safe to automatically retry: error="rpc error: failed to get conn: dial tcp 127.0.0.1:4646: connect: connection refused" rpc=Node.Register server=127.0.0.1:4646
Mar 13 18:24:28 server-0 nomad[526]:     2021-03-13T18:24:28.367+0100 [ERROR] client: error registering: error="rpc error: failed to get conn: dial tcp 127.0.0.1:4646: connect: connection refused"
Mar 13 18:24:28 server-0 nomad[526]:     2021-03-13T18:24:28.403+0100 [WARN]  client.driver_mgr.docker: failed to reattach to docker logger process: driver=docker error="failed to reattach to docker logger process: Reattachment process not found"
Mar 13 18:24:28 server-0 nomad[526]:     2021-03-13T18:24:28.423+0100 [ERROR] client.driver_mgr.docker.docker_logger.nomad: log streaming ended with terminal error: driver=docker @module=docker_logger error="open /opt/nomad/data/alloc/2b0a535b-31b5-536a-5f1f-fce28ab9039d/alloc/logs/.arangodb.stdout.fifo: no such file or directory" timestamp=2021-03-13T18:24:28.423+0100
Mar 13 18:24:28 server-0 nomad[526]:     2021-03-13T18:24:28.449+0100 [INFO]  client: started client: node_id=20843003-7c06-2b05-8e29-090a1a5e118f
Mar 13 18:24:28 server-0 nomad[526]:     2021-03-13T18:24:28.454+0100 [INFO]  client.gc: marking allocation for GC: alloc_id=1dd29296-2043-7845-7fd9-ad9c6401338d
Mar 13 18:24:28 server-0 nomad[526]:     2021-03-13T18:24:28.455+0100 [INFO]  client.gc: marking allocation for GC: alloc_id=3e4cd59e-8021-deea-7bef-145c08b916d0
Mar 13 18:24:28 server-0 nomad[526]:     2021-03-13T18:24:28.455+0100 [INFO]  client.gc: marking allocation for GC: alloc_id=9ef77798-f8de-f31f-d786-89378fb82b45
Mar 13 18:24:28 server-0 nomad[526]:     2021-03-13T18:24:28.455+0100 [INFO]  client.gc: marking allocation for GC: alloc_id=e6c2f195-fed5-0758-3d5c-4d4dc4e31c54
Mar 13 18:24:28 server-0 nomad[526]:     2021-03-13T18:24:28.455+0100 [INFO]  client.gc: marking allocation for GC: alloc_id=fb6c61e9-bea7-5371-ddde-9737856c6939
Mar 13 18:24:28 server-0 nomad[526]:     2021-03-13T18:24:28.455+0100 [INFO]  client.gc: marking allocation for GC: alloc_id=0e0513cc-1ea3-2b0a-dbaf-14730684a0d4
Mar 13 18:24:28 server-0 nomad[526]:     2021-03-13T18:24:28.455+0100 [INFO]  client.gc: marking allocation for GC: alloc_id=14cd9970-f200-8893-d712-11d6741db11b
Mar 13 18:24:28 server-0 nomad[526]:     2021-03-13T18:24:28.455+0100 [INFO]  client.gc: marking allocation for GC: alloc_id=167462ef-95cd-82fd-3c18-ce18be9f2c9d
Mar 13 18:24:28 server-0 nomad[526]:     2021-03-13T18:24:28.455+0100 [INFO]  client.gc: marking allocation for GC: alloc_id=2264563b-b2a6-86fc-0d2c-8f6c193c05f3
Mar 13 18:24:28 server-0 nomad[526]:     2021-03-13T18:24:28.456+0100 [INFO]  client.gc: marking allocation for GC: alloc_id=4b64d23a-2fcf-0b28-75a5-26c8d4d01ef4
Mar 13 18:24:28 server-0 nomad[526]:     2021-03-13T18:24:28.456+0100 [INFO]  client.gc: marking allocation for GC: alloc_id=62cdb29f-9bfc-0aef-c629-eca29da4fda7
Mar 13 18:24:28 server-0 nomad[526]:     2021-03-13T18:24:28.456+0100 [INFO]  client.gc: marking allocation for GC: alloc_id=7e010a27-a485-5ac9-8b61-73cd3997bbf2
Mar 13 18:24:28 server-0 nomad[526]:     2021-03-13T18:24:28.458+0100 [WARN]  client.alloc_runner.task_runner.task_hook: failed to reattach to logmon process: alloc_id=2b0a535b-31b5-536a-5f1f-fce28ab9039d task=arangodb error="Reattachment process not found"
Mar 13 18:24:28 server-0 nomad[526]:     2021-03-13T18:24:28.472+0100 [INFO]  client.gc: marking allocation for GC: alloc_id=2b0a535b-31b5-536a-5f1f-fce28ab9039d
Mar 13 18:24:28 server-0 nomad[526]:     2021-03-13T18:24:28.472+0100 [INFO]  client.gc: garbage collecting allocation: alloc_id=3e4cd59e-8021-deea-7bef-145c08b916d0 reason="forced collection"
Mar 13 18:24:28 server-0 nomad[526]:     2021-03-13T18:24:28.473+0100 [INFO]  client.gc: garbage collecting allocation: alloc_id=62cdb29f-9bfc-0aef-c629-eca29da4fda7 reason="forced collection"
Mar 13 18:24:28 server-0 nomad[526]:     2021-03-13T18:24:28.473+0100 [INFO]  client.gc: garbage collecting allocation: alloc_id=2b0a535b-31b5-536a-5f1f-fce28ab9039d reason="forced collection"
Mar 13 18:24:28 server-0 nomad[526]:     2021-03-13T18:24:28.473+0100 [INFO]  client.gc: garbage collecting allocation: alloc_id=9ef77798-f8de-f31f-d786-89378fb82b45 reason="forced collection"
Mar 13 18:24:28 server-0 nomad[526]:     2021-03-13T18:24:28.473+0100 [INFO]  client.gc: garbage collecting allocation: alloc_id=fb6c61e9-bea7-5371-ddde-9737856c6939 reason="forced collection"
Mar 13 18:24:28 server-0 nomad[526]:     2021-03-13T18:24:28.473+0100 [INFO]  client.gc: garbage collecting allocation: alloc_id=4b64d23a-2fcf-0b28-75a5-26c8d4d01ef4 reason="forced collection"
Mar 13 18:24:28 server-0 nomad[526]:     2021-03-13T18:24:28.473+0100 [INFO]  client.gc: garbage collecting allocation: alloc_id=2264563b-b2a6-86fc-0d2c-8f6c193c05f3 reason="forced collection"
Mar 13 18:24:28 server-0 nomad[526]:     2021-03-13T18:24:28.473+0100 [INFO]  client.gc: garbage collecting allocation: alloc_id=1dd29296-2043-7845-7fd9-ad9c6401338d reason="forced collection"
Mar 13 18:24:28 server-0 nomad[526]:     2021-03-13T18:24:28.473+0100 [INFO]  client.gc: garbage collecting allocation: alloc_id=7e010a27-a485-5ac9-8b61-73cd3997bbf2 reason="forced collection"
Mar 13 18:24:28 server-0 nomad[526]:     2021-03-13T18:24:28.473+0100 [INFO]  client.gc: garbage collecting allocation: alloc_id=0e0513cc-1ea3-2b0a-dbaf-14730684a0d4 reason="forced collection"
Mar 13 18:24:28 server-0 nomad[526]:     2021-03-13T18:24:28.485+0100 [INFO]  client.alloc_runner.task_runner.task_hook.logmon.nomad: opening fifo: alloc_id=2b0a535b-31b5-536a-5f1f-fce28ab9039d task=arangodb @module=logmon path=/opt/nomad/data/alloc/2b0a535b-31b5-536a-5f1f-fce28ab9039d/alloc/logs/.arangodb.stdout.fifo timestamp=2021-03-13T18:24:28.484+0100
Mar 13 18:24:28 server-0 nomad[526]:     2021-03-13T18:24:28.485+0100 [INFO]  client.alloc_runner.task_runner.task_hook.logmon.nomad: opening fifo: alloc_id=2b0a535b-31b5-536a-5f1f-fce28ab9039d task=arangodb @module=logmon path=/opt/nomad/data/alloc/2b0a535b-31b5-536a-5f1f-fce28ab9039d/alloc/logs/.arangodb.stderr.fifo timestamp=2021-03-13T18:24:28.485+0100
Mar 13 18:24:28 server-0 nomad[526]:     2021/03/13 18:24:28.498590 [INFO] (runner) creating new runner (dry: false, once: false)
Mar 13 18:24:28 server-0 nomad[526]:     2021/03/13 18:24:28.499534 [INFO] (runner) creating watcher
Mar 13 18:24:28 server-0 nomad[526]:     2021/03/13 18:24:28.501957 [INFO] (runner) starting
Mar 13 18:24:28 server-0 nomad[526]:     2021/03/13 18:24:28.505304 [INFO] (runner) rendered "/root/env/arangodb.env" => "/opt/nomad/data/alloc/2b0a535b-31b5-536a-5f1f-fce28ab9039d/arangodb/arangodb.env"
Mar 13 18:24:28 server-0 nomad[526]:     2021-03-13T18:24:28.534+0100 [WARN]  client.driver_mgr.docker: RemoveImage on non-referenced counted image id: driver=docker image_id=sha256:972479e15e2d87f60f97b430bdb424f3a85a0504965b1f1d6ef1b0ff09581920
Mar 13 18:24:32 server-0 nomad[526]:     2021-03-13T18:24:32.548+0100 [WARN]  client.alloc_runner.task_runner.task_hook.logmon.nomad: timed out waiting for read-side of process output pipe to close: alloc_id=2b0a535b-31b5-536a-5f1f-fce28ab9039d task=arangodb @module=logmon timestamp=2021-03-13T18:24:32.548+0100
Mar 13 18:24:32 server-0 nomad[526]:     2021-03-13T18:24:32.549+0100 [WARN]  client.alloc_runner.task_runner.task_hook.logmon.nomad: timed out waiting for read-side of process output pipe to close: alloc_id=2b0a535b-31b5-536a-5f1f-fce28ab9039d task=arangodb @module=logmon timestamp=2021-03-13T18:24:32.548+0100
Mar 13 18:24:32 server-0 nomad[526]:     2021/03/13 18:24:32.554773 [INFO] (runner) stopping
Mar 13 18:24:32 server-0 nomad[526]:     2021/03/13 18:24:32.554955 [INFO] (runner) received finish
Mar 13 18:24:33 server-0 nomad[526]:     2021-03-13T18:24:33.495+0100 [ERROR] client.rpc: error performing RPC to server: error="rpc error: No cluster leader" rpc=Node.Register server=0.0.0.0:4647
Mar 13 18:24:33 server-0 nomad[526]:     2021-03-13T18:24:33.496+0100 [ERROR] client.rpc: error performing RPC to server, deadline exceeded, cannot retry: error="rpc error: No cluster leader" rpc=Node.Register server=0.0.0.0:4647
Mar 13 18:24:33 server-0 nomad[526]:     2021-03-13T18:24:33.496+0100 [ERROR] client: error registering: error="rpc error: No cluster leader"
Mar 13 18:24:33 server-0 nomad[526]:     2021-03-13T18:24:33.704+0100 [ERROR] client.rpc: error performing RPC to server: error="rpc error: No cluster leader" rpc=Node.UpdateAlloc server=0.0.0.0:4647
Mar 13 18:24:33 server-0 nomad[526]:     2021-03-13T18:24:33.704+0100 [ERROR] client.rpc: error performing RPC to server, deadline exceeded, cannot retry: error="rpc error: No cluster leader" rpc=Node.UpdateAlloc server=0.0.0.0:4647
Mar 13 18:24:33 server-0 nomad[526]:     2021-03-13T18:24:33.704+0100 [ERROR] client: error updating allocations: error="rpc error: No cluster leader"
Mar 13 18:24:36 server-0 nomad[526]:     2021-03-13T18:24:36.762+0100 [ERROR] worker: failed to dequeue evaluation: error="No cluster leader"
Mar 13 18:24:36 server-0 nomad[526]:     2021-03-13T18:24:36.808+0100 [ERROR] worker: failed to dequeue evaluation: error="No cluster leader"
Mar 13 18:24:41 server-0 nomad[526]:     2021-03-13T18:24:41.943+0100 [ERROR] worker: failed to dequeue evaluation: error="No cluster leader"
Mar 13 18:24:41 server-0 nomad[526]:     2021-03-13T18:24:41.950+0100 [ERROR] worker: failed to dequeue evaluation: error="No cluster leader"
Mar 13 18:24:46 server-0 nomad[526]:     2021-03-13T18:24:46.590+0100 [ERROR] client.rpc: error performing RPC to server: error="rpc error: No cluster leader" rpc=Node.UpdateAlloc server=10.0.1.1:4647
Mar 13 18:24:46 server-0 nomad[526]:     2021-03-13T18:24:46.592+0100 [ERROR] client.rpc: error performing RPC to server, deadline exceeded, cannot retry: error="rpc error: No cluster leader" rpc=Node.UpdateAlloc server=10.0.1.1:4647
Mar 13 18:24:46 server-0 nomad[526]:     2021-03-13T18:24:46.592+0100 [ERROR] client: error updating allocations: error="rpc error: No cluster leader"
Mar 13 18:24:47 server-0 nomad[526]:     2021-03-13T18:24:47.279+0100 [ERROR] worker: failed to dequeue evaluation: error="No cluster leader"
Mar 13 18:24:47 server-0 nomad[526]:     2021-03-13T18:24:47.341+0100 [ERROR] worker: failed to dequeue evaluation: error="No cluster leader"
Mar 13 18:24:53 server-0 nomad[526]:     2021-03-13T18:24:53.545+0100 [ERROR] client: yamux: Invalid protocol version: 72
Mar 13 18:24:53 server-0 nomad[526]:     2021-03-13T18:24:53.546+0100 [ERROR] client: yamux: Failed to write header: write tcp 127.0.0.1:59638->127.0.0.1:4646: write: connection reset by peer
Mar 13 18:24:53 server-0 nomad[526]:     2021-03-13T18:24:53.546+0100 [ERROR] client.rpc: error performing RPC to server: error="rpc error: msgpack encode error: write tcp 127.0.0.1:59638->127.0.0.1:4646: write: connection reset by peer" rpc=Node.UpdateAlloc server=127.0.0.1:4646
Mar 13 18:24:53 server-0 nomad[526]:     2021-03-13T18:24:53.546+0100 [ERROR] client.rpc: error performing RPC to server which is not safe to automatically retry: error="rpc error: msgpack encode error: write tcp 127.0.0.1:59638->127.0.0.1:4646: write: connection reset by peer" rpc=Node.UpdateAlloc server=127.0.0.1:4646
Mar 13 18:24:53 server-0 nomad[526]:     2021-03-13T18:24:53.546+0100 [ERROR] client: error updating allocations: error="rpc error: msgpack encode error: write tcp 127.0.0.1:59638->127.0.0.1:4646: write: connection reset by peer"
Mar 13 18:24:53 server-0 nomad[526]:     2021-03-13T18:24:53.628+0100 [ERROR] worker: failed to dequeue evaluation: error="No cluster leader"
Mar 13 18:24:53 server-0 nomad[526]:     2021-03-13T18:24:53.788+0100 [ERROR] worker: failed to dequeue evaluation: error="No cluster leader"
Mar 13 18:24:59 server-0 nomad[526]:     2021-03-13T18:24:59.729+0100 [ERROR] client.rpc: error performing RPC to server: error="rpc error: No cluster leader" rpc=Node.Register server=0.0.0.0:4647
Mar 13 18:24:59 server-0 nomad[526]:     2021-03-13T18:24:59.729+0100 [ERROR] client.rpc: error performing RPC to server, deadline exceeded, cannot retry: error="rpc error: No cluster leader" rpc=Node.Register server=0.0.0.0:4647
Mar 13 18:24:59 server-0 nomad[526]:     2021-03-13T18:24:59.729+0100 [ERROR] client: error registering: error="rpc error: No cluster leader"
Mar 13 18:25:03 server-0 nomad[526]:     2021-03-13T18:25:03.828+0100 [ERROR] worker: failed to dequeue evaluation: error="No cluster leader"
Mar 13 18:25:04 server-0 nomad[526]:     2021-03-13T18:25:04.161+0100 [ERROR] worker: failed to dequeue evaluation: error="No cluster leader"
Mar 13 18:25:07 server-0 nomad[526]:     2021-03-13T18:25:07.060+0100 [ERROR] client.rpc: error performing RPC to server: error="rpc error: No cluster leader" rpc=Node.UpdateAlloc server=10.0.1.1:4647
Mar 13 18:25:07 server-0 nomad[526]:     2021-03-13T18:25:07.060+0100 [ERROR] client.rpc: error performing RPC to server, deadline exceeded, cannot retry: error="rpc error: No cluster leader" rpc=Node.UpdateAlloc server=10.0.1.1:4647
Mar 13 18:25:07 server-0 nomad[526]:     2021-03-13T18:25:07.060+0100 [ERROR] client: error updating allocations: error="rpc error: No cluster leader"
Mar 13 18:25:14 server-0 nomad[526]:     2021-03-13T18:25:14.510+0100 [ERROR] client: yamux: Invalid protocol version: 72
Mar 13 18:25:14 server-0 nomad[526]:     2021-03-13T18:25:14.510+0100 [ERROR] client: yamux: Failed to write body: write tcp 127.0.0.1:59646->127.0.0.1:4646: write: connection reset by peer
Mar 13 18:25:14 server-0 nomad[526]:     2021-03-13T18:25:14.510+0100 [ERROR] client.rpc: error performing RPC to server: error="rpc error: msgpack encode error: session shutdown" rpc=Node.UpdateAlloc server=127.0.0.1:4646
Mar 13 18:25:14 server-0 nomad[526]:     2021-03-13T18:25:14.510+0100 [ERROR] client.rpc: error performing RPC to server which is not safe to automatically retry: error="rpc error: msgpack encode error: session shutdown" rpc=Node.UpdateAlloc server=127.0.0.1:4646
Mar 13 18:25:14 server-0 nomad[526]:     2021-03-13T18:25:14.510+0100 [ERROR] client: error updating allocations: error="rpc error: msgpack encode error: session shutdown"
Mar 13 18:25:18 server-0 nomad[526]:     2021-03-13T18:25:18.508+0100 [ERROR] worker: failed to dequeue evaluation: error="No cluster leader"
Mar 13 18:25:18 server-0 nomad[526]:     2021-03-13T18:25:18.632+0100 [ERROR] worker: failed to dequeue evaluation: error="No cluster leader"
Mar 13 18:25:19 server-0 nomad[526]:     2021-03-13T18:25:19.556+0100 [ERROR] client.rpc: error performing RPC to server: error="rpc error: No cluster leader" rpc=Node.Register server=0.0.0.0:4647
Mar 13 18:25:19 server-0 nomad[526]:     2021-03-13T18:25:19.556+0100 [ERROR] client.rpc: error performing RPC to server, deadline exceeded, cannot retry: error="rpc error: No cluster leader" rpc=Node.Register server=0.0.0.0:4647
Mar 13 18:25:19 server-0 nomad[526]:     2021-03-13T18:25:19.556+0100 [ERROR] client: error registering: error="rpc error: No cluster leader"
Mar 13 18:25:27 server-0 nomad[526]:     2021-03-13T18:25:27.129+0100 [ERROR] client.rpc: error performing RPC to server: error="rpc error: No cluster leader" rpc=Node.UpdateAlloc server=10.0.1.1:4647
Mar 13 18:25:27 server-0 nomad[526]:     2021-03-13T18:25:27.129+0100 [ERROR] client.rpc: error performing RPC to server, deadline exceeded, cannot retry: error="rpc error: No cluster leader" rpc=Node.UpdateAlloc server=10.0.1.1:4647
Mar 13 18:25:27 server-0 nomad[526]:     2021-03-13T18:25:27.129+0100 [ERROR] client: error updating allocations: error="rpc error: No cluster leader"

Logs after restarting via systemctl stop nomad && systemctl start nomad

Mar 13 18:28:55 server-0 nomad[1092]: ==> WARNING: Bootstrap mode enabled! Potentially unsafe operation.
Mar 13 18:28:55 server-0 nomad[1092]: ==> Loaded configuration from /etc/nomad.d/nomad.hcl
Mar 13 18:28:55 server-0 nomad[1092]: ==> Starting Nomad agent...
Mar 13 18:28:55 server-0 nomad[1092]: ==> Nomad agent configuration:
Mar 13 18:28:55 server-0 nomad[1092]:        Advertise Addrs: HTTP: 172.17.0.1:4646; RPC: 172.17.0.1:4647; Serf: 172.17.0.1:4648
Mar 13 18:28:55 server-0 nomad[1092]:             Bind Addrs: HTTP: 0.0.0.0:4646; RPC: 0.0.0.0:4647; Serf: 0.0.0.0:4648
Mar 13 18:28:55 server-0 nomad[1092]:                 Client: true
Mar 13 18:28:55 server-0 nomad[1092]:              Log Level: INFO
Mar 13 18:28:55 server-0 nomad[1092]:                 Region: global (DC: mode)
Mar 13 18:28:55 server-0 nomad[1092]:                 Server: true
Mar 13 18:28:55 server-0 nomad[1092]:                Version: 1.0.4
Mar 13 18:28:55 server-0 nomad[1092]: ==> Nomad agent started! Log data will stream in below:
Mar 13 18:28:55 server-0 nomad[1092]:     2021-03-13T18:28:55.194+0100 [WARN]  agent.plugin_loader: skipping external plugins since plugin_dir doesn't exist: plugin_dir=/opt/nomad/data/plugins
Mar 13 18:28:55 server-0 nomad[1092]:     2021-03-13T18:28:55.198+0100 [INFO]  agent: detected plugin: name=nvidia-gpu type=device plugin_version=0.1.0
Mar 13 18:28:55 server-0 nomad[1092]:     2021-03-13T18:28:55.199+0100 [INFO]  agent: detected plugin: name=docker type=driver plugin_version=0.1.0
Mar 13 18:28:55 server-0 nomad[1092]:     2021-03-13T18:28:55.199+0100 [INFO]  agent: detected plugin: name=raw_exec type=driver plugin_version=0.1.0
Mar 13 18:28:55 server-0 nomad[1092]:     2021-03-13T18:28:55.199+0100 [INFO]  agent: detected plugin: name=exec type=driver plugin_version=0.1.0
Mar 13 18:28:55 server-0 nomad[1092]:     2021-03-13T18:28:55.199+0100 [INFO]  agent: detected plugin: name=qemu type=driver plugin_version=0.1.0
Mar 13 18:28:55 server-0 nomad[1092]:     2021-03-13T18:28:55.199+0100 [INFO]  agent: detected plugin: name=java type=driver plugin_version=0.1.0
Mar 13 18:28:55 server-0 nomad[1092]:     2021-03-13T18:28:55.208+0100 [INFO]  nomad.raft: restored from snapshot: id=5-139292-1615545712266
Mar 13 18:28:55 server-0 nomad[1092]:     2021-03-13T18:28:55.226+0100 [INFO]  nomad.raft: initial configuration: index=1 servers="[{Suffrage:Voter ID:172.17.0.1:4647 Address:172.17.0.1:4647}]"
Mar 13 18:28:55 server-0 nomad[1092]:     2021-03-13T18:28:55.226+0100 [INFO]  nomad.raft: entering follower state: follower="Node at 172.17.0.1:4647 [Follower]" leader=
Mar 13 18:28:55 server-0 nomad[1092]:     2021-03-13T18:28:55.228+0100 [INFO]  nomad: serf: EventMemberJoin: server-0.global 172.17.0.1
Mar 13 18:28:55 server-0 nomad[1092]:     2021-03-13T18:28:55.229+0100 [INFO]  nomad: starting scheduling worker(s): num_workers=2 schedulers=[service, batch, system, _core]
Mar 13 18:28:55 server-0 nomad[1092]:     2021-03-13T18:28:55.229+0100 [INFO]  client: using state directory: state_dir=/opt/nomad/data/client
Mar 13 18:28:55 server-0 nomad[1092]:     2021-03-13T18:28:55.229+0100 [INFO]  client: using alloc directory: alloc_dir=/opt/nomad/data/alloc
Mar 13 18:28:55 server-0 nomad[1092]:     2021-03-13T18:28:55.229+0100 [INFO]  nomad: adding server: server="server-0.global (Addr: 172.17.0.1:4647) (DC: mode)"
Mar 13 18:28:55 server-0 nomad[1092]:     2021-03-13T18:28:55.229+0100 [WARN]  nomad: serf: Failed to re-join any previously known node
Mar 13 18:28:55 server-0 nomad[1092]:     2021-03-13T18:28:55.231+0100 [INFO]  client.fingerprint_mgr.cgroup: cgroups are available
Mar 13 18:28:55 server-0 nomad[1092]:     2021-03-13T18:28:55.235+0100 [WARN]  client.fingerprint_mgr.network: unable to parse speed: path=/usr/sbin/ethtool device=ens10
Mar 13 18:28:55 server-0 nomad[1092]:     2021-03-13T18:28:55.237+0100 [WARN]  client.fingerprint_mgr.network: unable to parse speed: path=/usr/sbin/ethtool device=lo
Mar 13 18:28:55 server-0 nomad[1092]:     2021-03-13T18:28:55.238+0100 [WARN]  client.fingerprint_mgr.network: unable to parse speed: path=/usr/sbin/ethtool device=eth0
Mar 13 18:28:55 server-0 nomad[1092]:     2021-03-13T18:28:55.239+0100 [WARN]  client.fingerprint_mgr.network: unable to parse speed: path=/usr/sbin/ethtool device=ens10
Mar 13 18:28:55 server-0 nomad[1092]:     2021-03-13T18:28:55.240+0100 [WARN]  client.fingerprint_mgr.network: unable to parse speed: path=/usr/sbin/ethtool device=docker0
Mar 13 18:28:55 server-0 nomad[1092]:     2021-03-13T18:28:55.247+0100 [INFO]  client.plugin: starting plugin manager: plugin-type=csi
Mar 13 18:28:55 server-0 nomad[1092]:     2021-03-13T18:28:55.248+0100 [INFO]  client.plugin: starting plugin manager: plugin-type=driver
Mar 13 18:28:55 server-0 nomad[1092]:     2021-03-13T18:28:55.248+0100 [INFO]  client.plugin: starting plugin manager: plugin-type=device
Mar 13 18:28:55 server-0 nomad[1092]:     2021-03-13T18:28:55.302+0100 [ERROR] client.rpc: error performing RPC to server: error="rpc error: failed to get conn: dial tcp 127.0.0.1:4646: connect: connection refused" rpc=Node.Register server=127.0.0.1:4646
Mar 13 18:28:55 server-0 nomad[1092]:     2021-03-13T18:28:55.302+0100 [ERROR] client.rpc: error performing RPC to server which is not safe to automatically retry: error="rpc error: failed to get conn: dial tcp 127.0.0.1:4646: connect: connection refused" rpc=Node.Register server=127.0.0.1:4646
Mar 13 18:28:55 server-0 nomad[1092]:     2021-03-13T18:28:55.302+0100 [ERROR] client: error registering: error="rpc error: failed to get conn: dial tcp 127.0.0.1:4646: connect: connection refused"
Mar 13 18:28:55 server-0 nomad[1092]:     2021-03-13T18:28:55.306+0100 [INFO]  client: started client: node_id=20843003-7c06-2b05-8e29-090a1a5e118f
Mar 13 18:28:55 server-0 nomad[1092]:     2021-03-13T18:28:55.310+0100 [INFO]  client.gc: marking allocation for GC: alloc_id=e6c2f195-fed5-0758-3d5c-4d4dc4e31c54
Mar 13 18:28:55 server-0 nomad[1092]:     2021-03-13T18:28:55.311+0100 [INFO]  client.gc: marking allocation for GC: alloc_id=14cd9970-f200-8893-d712-11d6741db11b
Mar 13 18:28:55 server-0 nomad[1092]:     2021-03-13T18:28:55.312+0100 [INFO]  client.gc: marking allocation for GC: alloc_id=167462ef-95cd-82fd-3c18-ce18be9f2c9d
Mar 13 18:28:57 server-0 nomad[1092]:     2021-03-13T18:28:57.222+0100 [WARN]  nomad.raft: heartbeat timeout reached, starting election: last-leader=
Mar 13 18:28:57 server-0 nomad[1092]:     2021-03-13T18:28:57.222+0100 [INFO]  nomad.raft: entering candidate state: node="Node at 172.17.0.1:4647 [Candidate]" term=11
Mar 13 18:28:57 server-0 nomad[1092]:     2021-03-13T18:28:57.225+0100 [INFO]  nomad.raft: election won: tally=1
Mar 13 18:28:57 server-0 nomad[1092]:     2021-03-13T18:28:57.225+0100 [INFO]  nomad.raft: entering leader state: leader="Node at 172.17.0.1:4647 [Leader]"
Mar 13 18:28:57 server-0 nomad[1092]:     2021-03-13T18:28:57.226+0100 [INFO]  nomad: cluster leadership acquired
Mar 13 18:28:57 server-0 nomad[1092]:     2021-03-13T18:28:57.325+0100 [INFO]  client.alloc_runner.task_runner.task_hook.logmon.nomad: opening fifo: alloc_id=4b64d23a-2fcf-0b28-75a5-26c8d4d01ef4 task=arangodb @module=logmon path=/opt/nomad/data/alloc/4b64d23a-2fcf-0b28-75a5-26c8d4d01ef4/alloc/logs/.arangodb.stdout.fifo timestamp=2021-03-13T18:28:57.323+0100
Mar 13 18:28:57 server-0 nomad[1092]:     2021-03-13T18:28:57.325+0100 [INFO]  client.alloc_runner.task_runner.task_hook.logmon.nomad: opening fifo: alloc_id=4b64d23a-2fcf-0b28-75a5-26c8d4d01ef4 task=arangodb @module=logmon path=/opt/nomad/data/alloc/4b64d23a-2fcf-0b28-75a5-26c8d4d01ef4/alloc/logs/.arangodb.stderr.fifo timestamp=2021-03-13T18:28:57.323+0100
Mar 13 18:28:57 server-0 nomad[1092]:     2021/03/13 18:28:57.363967 [INFO] (runner) creating new runner (dry: false, once: false)
Mar 13 18:28:57 server-0 nomad[1092]:     2021/03/13 18:28:57.364819 [INFO] (runner) creating watcher
Mar 13 18:28:57 server-0 nomad[1092]:     2021/03/13 18:28:57.365174 [INFO] (runner) starting
Mar 13 18:28:57 server-0 nomad[1092]:     2021/03/13 18:28:57.367497 [INFO] (runner) rendered "/root/env/arangodb.env" => "/opt/nomad/data/alloc/4b64d23a-2fcf-0b28-75a5-26c8d4d01ef4/arangodb/arangodb.env"
Mar 13 18:28:57 server-0 nomad[1092]:     2021-03-13T18:28:57.367+0100 [INFO]  client.gc: marking allocation for GC: alloc_id=3e4cd59e-8021-deea-7bef-145c08b916d0
Mar 13 18:28:57 server-0 nomad[1092]:     2021-03-13T18:28:57.368+0100 [INFO]  client.gc: marking allocation for GC: alloc_id=fb6c61e9-bea7-5371-ddde-9737856c6939
Mar 13 18:28:57 server-0 nomad[1092]:     2021-03-13T18:28:57.391+0100 [INFO]  client.alloc_runner.task_runner.task_hook.logmon.nomad: opening fifo: alloc_id=9ef77798-f8de-f31f-d786-89378fb82b45 task=rabbit @module=logmon path=/opt/nomad/data/alloc/9ef77798-f8de-f31f-d786-89378fb82b45/alloc/logs/.rabbit.stdout.fifo timestamp=2021-03-13T18:28:57.391+0100
Mar 13 18:28:57 server-0 nomad[1092]:     2021-03-13T18:28:57.392+0100 [INFO]  client.alloc_runner.task_runner.task_hook.logmon.nomad: opening fifo: alloc_id=9ef77798-f8de-f31f-d786-89378fb82b45 task=rabbit path=/opt/nomad/data/alloc/9ef77798-f8de-f31f-d786-89378fb82b45/alloc/logs/.rabbit.stderr.fifo @module=logmon timestamp=2021-03-13T18:28:57.392+0100
Mar 13 18:28:57 server-0 nomad[1092]:     2021-03-13T18:28:57.413+0100 [INFO]  client.alloc_runner.task_runner.task_hook.logmon.nomad: opening fifo: alloc_id=2264563b-b2a6-86fc-0d2c-8f6c193c05f3 task=mode-api @module=logmon path=/opt/nomad/data/alloc/2264563b-b2a6-86fc-0d2c-8f6c193c05f3/alloc/logs/.mode-api.stdout.fifo timestamp=2021-03-13T18:28:57.412+0100
Mar 13 18:28:57 server-0 nomad[1092]:     2021-03-13T18:28:57.415+0100 [INFO]  client.alloc_runner.task_runner.task_hook.logmon.nomad: opening fifo: alloc_id=2264563b-b2a6-86fc-0d2c-8f6c193c05f3 task=mode-api path=/opt/nomad/data/alloc/2264563b-b2a6-86fc-0d2c-8f6c193c05f3/alloc/logs/.mode-api.stderr.fifo @module=logmon timestamp=2021-03-13T18:28:57.414+0100
Mar 13 18:28:57 server-0 nomad[1092]:     2021-03-13T18:28:57.437+0100 [INFO]  client.alloc_runner.task_runner.task_hook.logmon.nomad: opening fifo: alloc_id=1dd29296-2043-7845-7fd9-ad9c6401338d task=mode-api @module=logmon path=/opt/nomad/data/alloc/1dd29296-2043-7845-7fd9-ad9c6401338d/alloc/logs/.mode-api.stdout.fifo timestamp=2021-03-13T18:28:57.437+0100
Mar 13 18:28:57 server-0 nomad[1092]:     2021-03-13T18:28:57.438+0100 [INFO]  client.alloc_runner.task_runner.task_hook.logmon.nomad: opening fifo: alloc_id=1dd29296-2043-7845-7fd9-ad9c6401338d task=mode-api @module=logmon path=/opt/nomad/data/alloc/1dd29296-2043-7845-7fd9-ad9c6401338d/alloc/logs/.mode-api.stderr.fifo timestamp=2021-03-13T18:28:57.437+0100
Mar 13 18:28:57 server-0 nomad[1092]:     2021-03-13T18:28:57.442+0100 [INFO]  client.alloc_runner.task_runner.task_hook.logmon.nomad: opening fifo: alloc_id=0e0513cc-1ea3-2b0a-dbaf-14730684a0d4 task=mode-api path=/opt/nomad/data/alloc/0e0513cc-1ea3-2b0a-dbaf-14730684a0d4/alloc/logs/.mode-api.stdout.fifo @module=logmon timestamp=2021-03-13T18:28:57.442+0100
Mar 13 18:28:57 server-0 nomad[1092]:     2021-03-13T18:28:57.444+0100 [INFO]  client.driver_mgr.docker: created container: driver=docker container_id=8000bb3712ed59bd61322800f98fff2c2ae7c3fc928a3dd56817c875dbf04e39
Mar 13 18:28:57 server-0 nomad[1092]:     2021-03-13T18:28:57.459+0100 [INFO]  client.alloc_runner.task_runner.task_hook.logmon.nomad: opening fifo: alloc_id=0e0513cc-1ea3-2b0a-dbaf-14730684a0d4 task=mode-api path=/opt/nomad/data/alloc/0e0513cc-1ea3-2b0a-dbaf-14730684a0d4/alloc/logs/.mode-api.stderr.fifo @module=logmon timestamp=2021-03-13T18:28:57.458+0100
notnoop commented 3 years ago

Hi @sebastiankr ! Thanks for including the logs. I'm seeing few odd things - it looks as if the IP address changed between nomad restarts. The first log shows the server advertising 10.0.1.1, but the initial raft configuration being for 172.17.0.1 . Nomad server identity is tied around ip addresses, and such changes are disruptive to nomad operations. If the server has multiple interfaces, I'd suggest explicitly setting the server advertise addresses to a stable IP and trying again. If the workloads are ephemeral, you can also wipe the nomad data dir on restart.

For context, Nomad defaults to using ip address as the stable identifier for servers. If a server restarts with a new ip address, it will be considered a new one. Raft protocol 3 addresses this issue, by using a stable id as the identity so servers can keep their identity after ip address changes.

sebastiankr commented 3 years ago

Thanks @notnoop! This got me on the right track. 172.17.0.1 is the docker0 device. 10.0.1.1 is my subnet that nomad should be using and was using before, even after restarting the service.

It seems that the logic of how the private network is selected has changed and the upgrade to 1.0.4 has changed the identity to the docker ip. I nuked /opt/nomad/data/ and did explicitly set the advertise ips and it is now running as expected.

Maybe advertise should not be optional? Or the network selection logic should be stable and always prefer class A networks over B over C? Not sure if this is something nomad should address.

notnoop commented 3 years ago

Great! I'm glad that helped. I'm surprised that nomad picked docker0 interface, and that nomad picked different interfaces between runs. I'll examine that code flow and will see how we can improve the situation more.

Just so I don't miss an edge case, can I have the full output of the following commands ip addr, ip link, and ip route. I wonder if a nomad heuristic went wrong in your case.

sebastiankr commented 3 years ago
root@server-0:~# ip ad
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host
       valid_lft forever preferred_lft forever
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP group default qlen 1000
    link/ether 96:X:X:X:X:X brd ff:ff:ff:ff:ff:ff
    inet 116.X.X.X/32 scope global eth0
       valid_lft forever preferred_lft forever
    inet 116.X.X.X/32 scope global dynamic eth0
       valid_lft 60772sec preferred_lft 60772sec
    inet6 2a01:X:X:Xc::1/64 scope global
       valid_lft forever preferred_lft forever
    inet6 fe80::X:X:X:/64 scope link
       valid_lft forever preferred_lft forever
3: ens10: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc fq_codel state UP group default qlen 1000
    link/ether 86:00:00:84:1c:37 brd ff:ff:ff:ff:ff:ff
    inet 10.0.1.1/32 brd 10.0.1.1 scope global dynamic ens10
       valid_lft 60775sec preferred_lft 60775sec
    inet6 fe80::8400:ff:fe84:1c37/64 scope link
       valid_lft forever preferred_lft forever
4: docker0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state DOWN group default
    link/ether 02:42:c9:8c:2f:05 brd ff:ff:ff:ff:ff:ff
    inet 172.17.0.1/16 brd 172.17.255.255 scope global docker0
       valid_lft forever preferred_lft forever

root@server-0:~# ip link
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN mode DEFAULT group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP mode DEFAULT group default qlen 1000
    link/ether 96:X:X:X:X36 brd ff:ff:ff:ff:ff:ff
3: ens10: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc fq_codel state UP mode DEFAULT group default qlen 1000
    link/ether 86:00:00:84:1c:37 brd ff:ff:ff:ff:ff:ff
4: docker0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state DOWN mode DEFAULT group default
    link/ether 02:42:c9:8c:2f:05 brd ff:ff:ff:ff:ff:ff

root@server-0:~# ip route
default via 172.X.1.1 dev eth0 proto dhcp src 116.X.X.X metric 100
10.0.0.0/16 via 10.0.0.1 dev ens10
10.0.0.1 dev ens10 scope link
172.17.0.0/16 dev docker0 proto kernel scope link src 172.17.0.1 linkdown
172.X.1.1 dev eth0 proto dhcp scope link src 116.X.X.X metric 100