hashicorp / nomad

Nomad is an easy-to-use, flexible, and performant workload orchestrator that can deploy a mix of microservice, batch, containerized, and non-containerized applications. Nomad is easy to operate and scale and has native Consul and Vault integrations.
https://www.nomadproject.io/
Other
14.86k stars 1.95k forks source link

Nomad unable to parse network speed on AWS instances C5 & M5 #4605

Open scalp42 opened 6 years ago

scalp42 commented 6 years ago

Nomad version

root@nomad-compute-i-0955e66b39caebfab [us-west-2-infra1] /etc/nomad # nomad version
Nomad v0.8.4 (dbee1d7d051619e90a809c23cf7e55750900742a)

Operating system and Environment details

root@nomad-compute-i-0955e66b39caebfab [us-west-2-infra1] /etc/nomad # uname -a
Linux nomad-compute-i-0955e66b39caebfab 4.4.0-1065-aws #75-Ubuntu SMP Fri Aug 10 11:14:32 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux

Issue

Starting Nomad client, seeing the following in the log:

Aug 20 23:27:45 nomad-compute-i-0955e66b39caebfab nomad[14176]: ==> Nomad agent started! Log data will stream in below:
Aug 20 23:27:45 nomad-compute-i-0955e66b39caebfab nomad[14176]:     2018/08/20 23:27:45.709449 [INFO] client: using state directory /var/lib/nomad/client
Aug 20 23:27:45 nomad-compute-i-0955e66b39caebfab nomad[14176]:     2018/08/20 23:27:45.709520 [INFO] client: using alloc directory /var/lib/nomad/alloc
Aug 20 23:27:45 nomad-compute-i-0955e66b39caebfab nomad[14176]:     2018/08/20 23:27:45.711136 [INFO] fingerprint.cgroups: cgroups are available
Aug 20 23:27:45 nomad-compute-i-0955e66b39caebfab nomad[14176]:     2018/08/20 23:27:45.714712 [INFO] fingerprint.consul: consul agent is available
Aug 20 23:27:45 nomad-compute-i-0955e66b39caebfab nomad[14176]:     2018/08/20 23:27:45.722969 [WARN] fingerprint.network: Unable to parse Speed in output of '/sbin/ethtool ens5'
Aug 20 23:27:45 nomad-compute-i-0955e66b39caebfab nomad[14176]:     2018/08/20 23:27:45.732560 [WARN] driver.raw_exec: raw exec is enabled. Only enable if needed
Aug 20 23:27:45 nomad-compute-i-0955e66b39caebfab nomad[14176]:     2018/08/20 23:27:45.733143 [INFO] client: Node ID "7c1d6e5b-37db-b635-0499-50d34453dc18"

This is related to the new ens5 on M5 instances vs old eth0:

root@nomad-compute-i-0955e66b39caebfab [us-west-2-infra1] /etc/nomad # ifconfig -a
docker0   Link encap:Ethernet  HWaddr 02:42:dd:9d:2d:3d
          inet addr:172.17.0.1  Bcast:172.17.255.255  Mask:255.255.0.0
          UP BROADCAST MULTICAST  MTU:1500  Metric:1
          RX packets:0 errors:0 dropped:0 overruns:0 frame:0
          TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0
          RX bytes:0 (0.0 B)  TX bytes:0 (0.0 B)

ens5      Link encap:Ethernet  HWaddr 06:ba:cf:fb:27:98
          inet addr:10.42.8.13  Bcast:10.42.15.255  Mask:255.255.240.0
          inet6 addr: fe80::4ba:cfff:fefb:2798/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:9001  Metric:1
          RX packets:443022 errors:0 dropped:0 overruns:0 frame:0
          TX packets:95764 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:611634844 (611.6 MB)  TX bytes:14116891 (14.1 MB)

lo        Link encap:Local Loopback
          inet addr:127.0.0.1  Mask:255.0.0.0
          inet6 addr: ::1/128 Scope:Host
          UP LOOPBACK RUNNING  MTU:65536  Metric:1
          RX packets:8188 errors:0 dropped:0 overruns:0 frame:0
          TX packets:8188 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1
          RX bytes:1646030 (1.6 MB)  TX bytes:1646030 (1.6 MB)
scalp42 commented 6 years ago

On a T2 instance:

root@bastion-i-06db6b092231198b4 [us-west-2-infra1] ~ # ohai | gron | egrep 'default_interface|instance_type'
json.ec2.instance_type = "t2.medium";
json.network.default_interface = "eth0";

root@bastion-i-06db6b092231198b4 [us-west-2-infra1] ~ # ifconfig -a
eth0      Link encap:Ethernet  HWaddr 06:b1:0b:48:ee:ec
          inet addr:10.42.204.68  Bcast:10.42.207.255  Mask:255.255.248.0
          inet6 addr: fe80::4b1:bff:fe48:eeec/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:9001  Metric:1
          RX packets:819273 errors:0 dropped:0 overruns:0 frame:0
          TX packets:610175 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:600295730 (600.2 MB)  TX bytes:108481326 (108.4 MB)

lo        Link encap:Local Loopback
          inet addr:127.0.0.1  Mask:255.0.0.0
          inet6 addr: ::1/128 Scope:Host
          UP LOOPBACK RUNNING  MTU:65536  Metric:1
          RX packets:11223 errors:0 dropped:0 overruns:0 frame:0
          TX packets:11223 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1
          RX bytes:1290605 (1.2 MB)  TX bytes:1290605 (1.2 MB)

On a C5 instance:

root@nomad-master-i-0ca25392646a49622 [us-west-2-infra1] ~ # ohai | gron | egrep 'default_interface|instance_type'
json.ec2.instance_type = "c5.large";
json.network.default_interface = "ens5";

root@nomad-master-i-0ca25392646a49622 [us-west-2-infra1] ~ # ifconfig -a
docker0   Link encap:Ethernet  HWaddr 02:42:2f:80:c9:24
          inet addr:172.17.0.1  Bcast:172.17.255.255  Mask:255.255.0.0
          UP BROADCAST MULTICAST  MTU:1500  Metric:1
          RX packets:0 errors:0 dropped:0 overruns:0 frame:0
          TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0
          RX bytes:0 (0.0 B)  TX bytes:0 (0.0 B)

ens5      Link encap:Ethernet  HWaddr 02:f2:c4:5e:e6:ae
          inet addr:10.42.213.188  Bcast:10.42.215.255  Mask:255.255.248.0
          inet6 addr: fe80::f2:c4ff:fe5e:e6ae/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:9001  Metric:1
          RX packets:476129 errors:0 dropped:0 overruns:0 frame:0
          TX packets:115875 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:618475303 (618.4 MB)  TX bytes:14260106 (14.2 MB)

lo        Link encap:Local Loopback
          inet addr:127.0.0.1  Mask:255.0.0.0
          inet6 addr: ::1/128 Scope:Host
          UP LOOPBACK RUNNING  MTU:65536  Metric:1
          RX packets:8068 errors:0 dropped:0 overruns:0 frame:0
          TX packets:8068 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1
          RX bytes:1097824 (1.0 MB)  TX bytes:1097824 (1.0 MB)
tantra35 commented 6 years ago

I think that is not a bug, due of limitations of aws virtualisation platform ethtool <interface> doesn' t print to much info like on real hardware

you may use https://www.nomadproject.io/docs/agent/configuration/client.html#network_speed, as workaround, also you may note that real network speed in aws depend on instance type, and doesn't published in official documentation. You may use follow stackoverflow post https://stackoverflow.com/questions/18507405/ec2-instance-typess-exact-network-performance, as some empirical numbers

preetapan commented 6 years ago

4550 is another related issue with fingerprinting on AWS C5s

flyinprogrammer commented 4 years ago

ran into this issue today with m5 instances reporting 1Gbit but it's really 10Gbit