heapster-amd64 v1.3.0+ container image ignores the nameserver from /etc/resolv.conf

kubernetes-retired / heapster

[EOL] Compute Resource Usage Analysis and Monitoring of Container Clusters

Apache License 2.0

2.63k stars 1.25k forks source link

heapster-amd64 v1.3.0+ container image ignores the nameserver from /etc/resolv.conf #1658

Closed andrey01 closed 7 years ago

andrey01 commented 7 years ago

Hello,

in our environment we are setting sysctl net.ipv6.conf.all.disable_ipv6=1.

Having this, we observed that gcr.io/google_containers/heapster-amd64:v1.3.0 is constantly trying to use IPv6 for communication with the Cluster DNS 10.96.0.10 53/udp:

controller1# kubectl -n kube-system logs pods/$(kubectl -n kube-system get pods -l k8s-app=heapster --output=jsonpath='{.items[0].metadata.name}')  --tail=10 -f
E0523 13:18:04.551747       1 reflector.go:190] k8s.io/heapster/metrics/util/util.go:51: Failed to list *v1.Node: Get https://kubernetes.default/api/v1/nodes?resourceVersion=0: dial tcp: lookup kubernetes.default on [::1]:53: read udp [::1]:49013->[::1]:53: read: connection refused
E0523 13:18:04.554143       1 reflector.go:190] k8s.io/heapster/metrics/heapster.go:322: Failed to list *v1.Pod: Get https://kubernetes.default/api/v1/pods?resourceVersion=0: dial tcp: lookup kubernetes.default on [::1]:53: read udp [::1]:55476->[::1]:53: read: connection refused
E0523 13:18:04.556036       1 reflector.go:190] k8s.io/heapster/metrics/processors/namespace_based_enricher.go:84: Failed to list *v1.Namespace: Get https://kubernetes.default/api/v1/namespaces?resourceVersion=0: dial tcp: lookup kubernetes.default on [::1]:53: read udp [::1]:48489->[::1]:53: read: connection refused
E0523 13:18:04.557188       1 reflector.go:190] k8s.io/heapster/metrics/util/util.go:51: Failed to list *v1.Node: Get https://kubernetes.default/api/v1/nodes?resourceVersion=0: dial tcp: lookup kubernetes.default on [::1]:53: read udp [::1]:42894->[::1]:53: read: connection refused
E0523 13:18:04.619686       1 reflector.go:190] k8s.io/heapster/metrics/util/util.go:51: Failed to list *v1.Node: Get https://kubernetes.default/api/v1/nodes?resourceVersion=0: dial tcp: lookup kubernetes.default on [::1]:53: read udp [::1]:33541->[::1]:53: read: connection refused
E0523 13:18:05.000272       1 kubelet.go:273] No nodes received from APIserver.
E0523 13:18:05.001552       1 influxdb.go:150] Failed to create influxdb: failed to ping InfluxDB server at "monitoring-influxdb.kube-system.svc:8086" - Get http://monitoring-influxdb.kube-system.svc:8086/ping: dial tcp: lookup monitoring-influxdb.kube-system.svc on [::1]:53: read udp [::1]:42335->[::1]:53: read: connection refused

Though, with the gcr.io/google_containers/heapster-amd64:v1.3.0-beta.1 (2 month older image than the v1.3.0) we do not observe such behavior.

Please let me know, if you have some clues or ideas what could be done?

Docker v1.13.1 Kubernetes v1.6.4 (via kubeadm)

DirectXMan12 commented 7 years ago

This is probably not a Heapster issue. Are other pods working properly on your cluster? Heapster is using the standard DNS code from the Kubernetes libraries here. Are you actually injecting cluster DNS?

andrey01 commented 7 years ago

@DirectXMan12 the other pods are working properly on our cluster, we are running many different ones. I have just tried the heapster v1.4.0 with K8s v1.7.5 and it has the same issue.

I am nearly out of ideas, will have to dig deeper probably.

Meanwhile, if you have some ideas what could I try, let me know please.

Update 1

This is the way we run the heapster, kubectl apply the following manifests:

1. https://github.com/kubernetes/heapster/blob/1308dd71f0ba343895456b46d1bbf3238800b6f3/deploy/kube-config/rbac/heapster-rbac.yaml
2. https://github.com/kubernetes/heapster/blob/480a1ca21fe65882dd41f28cd9296a825fbbb602/deploy/kube-config/influxdb/heapster.yaml
3. https://github.com/kubernetes/heapster/blob/480a1ca21fe65882dd41f28cd9296a825fbbb602/deploy/kube-config/influxdb/influxdb.yaml
4. https://github.com/kubernetes/heapster/blob/69a4067c236032770c9145bba6634795454b88b1/deploy/kube-config/influxdb/grafana.yaml

Update 2

I have just double checked it in K8s 1.7.5, CentOS 7.3, docker 1.13.1 for all the available heapster docker images, so results are following:

v1.3.0-alpha.1, v1.3.0-beta.1 are working:

# kubectl -n kube-system logs --tail=1 -f heapster-1029728424-jnd84
I0915 13:44:59.804446       5 heapster.go:105] Starting heapster on port 8082
I0915 13:45:05.117265       5 influxdb.go:215] Created database "k8s" on influxDB server at "monitoring-influxdb.kube-system.svc:8086"

v1.3.0, v1.4.0-beta.0, v1.4.0, v1.4.1, v1.4.2, v1.5.0-beta.0 are not.

md5sum binary_path version commit
26a76e1037c987766c480ae07a713a07  /mnt/docker/overlay/e6068f9abe9d3f6935d7555be29c75d6417f44131254bcd877c8d126ad747e26/root/heapster v1.3.0-alpha.1 da3aab91
ca883b67c48fcedb7d69165e78edf9f0  /mnt/docker/overlay/3f7ce0183025b48aa926caee676eebb5de6993e99c1064d9cf70139875e66d43/root/heapster v1.3.0-beta.1 4bf87de3
--- below version are not working for me ---
01e20c88c736070d57392ddcf0efc1ae  /mnt/docker/overlay/044ba8b80c29fea0c4c8afb88a272c1fa735dd0a1c41ed51fa53a36ae799053e/root/heapster v1.3.0 abbfe6e0
3886e3f3b4f2b843e08067fd92a2bca1  /mnt/docker/overlay/a1b912c5ea9b6e63579e187f169fb336002bd9bcf9c0ffbc8b5e8fbf0a975afe/root/heapster v1.4.0-beta.0 a729275e
038f85fd510a45dab4bbe4cfe547b0b8  /mnt/docker/overlay/92b867dd8f82beb48bbefda7e20557ffc4d5f85c7a2b5c99e4cf13b4e7c75865/root/heapster v1.4.0 0f6d09f4
22a3cff26dfcfef5674c9d51336f9248  /mnt/docker/overlay/d52c27f15b0e053b8fbe9f069f66194fd1858be5d37dee6468e29951f1fbda4b/root/heapster v1.4.1 d7df1c0f
53b67e2f14099c20520bfae80a087977  /mnt/docker/overlay/04bf13900b2a372be91f3e5af6069f18c8e0ab15ac88d766e8e01bac770977f1/root/heapster v1.4.2 ae12a18a
191700a644713f0e33d1fc97f961a79c  /mnt/docker/overlay/37ea86525baca321fc1c118a3bf34d07be142b91ad5021e671ee3463fc262153/root/heapster v1.5.0-beta.0 859ccd72

So I guess either issue or its trigger lies somewhere between git log 4bf87de3...abbfe6e0 --stat -p.

Update 3

Would it make sense updating the vendor/k8s.io/apimachinery ? As on July 28 2017 IPv6 support for ChooseHostInterface was added (part 3 of 3) in https://github.com/kubernetes/apimachinery/commit/e97046a677c003086fbaac6222ae330b6cee12e8

andyxning commented 7 years ago

@andrey01

Do you have any other special settings for DNS?
What is the content of /etc/resolv.conf?

andrey01 commented 7 years ago

@andyxning

No special settings, we are using weave-net 2.0.4 at the moment;
as of /etc/resolv.conf please refer to the below outputs I have just prepared:

Adding a CentOS container

Useful for additional debugging, since the container will share the same namespace.

Slighlty modified the https://github.com/kubernetes/heapster/blob/480a1ca21fe65882dd41f28cd9296a825fbbb602/deploy/kube-config/influxdb/heapster.yaml to include a centos container running in the same namespace as a heapster.

    spec:
      serviceAccountName: heapster
      containers:
      - name: heapster
        image: gcr.io/google_containers/heapster-amd64:v1.3.0
        imagePullPolicy: IfNotPresent
        command:
        - /heapster
        - --source=kubernetes:https://kubernetes.default
        - --sink=influxdb:http://monitoring-influxdb.kube-system.svc:8086
      - name: centos
        image: centos:7
        imagePullPolicy: IfNotPresent
        command:
        - sleep
        - infinity

namespace is really shared

Just to make sure:

[root@myserver1:~] docker ps -a |grep -E "centos|heapster"
488d43ce3e83        centos@sha256:eba772bac22c86d7d6e72421b4700c3f894ab6e35475a34014ff8de74c10872e                                                                          "sleep infinity"         8 minutes ago       Up 8 minutes                                       k8s_centos_heapster-107544395-wzx51_kube-system_976900bf-9a21-11e7-bc9b-fa163e533806_0
4047d471bcc6        gcr.io/google_containers/heapster-amd64@sha256:3dff9b2425a196aa51df0cebde0f8b427388425ba84568721acf416fa003cd5c                                         "/heapster --sourc..."   8 minutes ago       Up 8 minutes                                       k8s_heapster_heapster-107544395-wzx51_kube-system_976900bf-9a21-11e7-bc9b-fa163e533806_0

[root@myserver1:~] docker inspect k8s_heapster_heapster-107544395-wzx51_kube-system_976900bf-9a21-11e7-bc9b-fa163e533806_0 --format '{{.State.Pid}} - {{.HostConfig.PidMode}}'
32341 - container:011aab52b74600416ac3cfba43e33c372b8c00b1e42a6f1ae892603cd5ce8f5c

[root@myserver1:~] docker inspect k8s_centos_heapster-107544395-wzx51_kube-system_976900bf-9a21-11e7-bc9b-fa163e533806_0 --format '{{.State.Pid}} - {{.HostConfig.PidMode}}'
32407 - container:011aab52b74600416ac3cfba43e33c372b8c00b1e42a6f1ae892603cd5ce8f5c

[root@myserver1:~] ps -ef|grep -Ew "32341|32407"
nfsnobo+ 32341 32324  0 16:24 ?        00:00:03 /heapster --source=kubernetes:https://kubernetes.default --sink=influxdb:http://monitoring-influxdb.kube-system.svc:8086
root     32407 32383  0 16:24 ?        00:00:00 sleep infinity

Getting into a container

[root@myserver1:~] kubectl get pods --all-namespaces -o wide |grep -i heap
kube-system     heapster-107544395-wzx51                            2/2       Running   0          1m        10.32.0.3      myserver1
[root@myserver1:~] kubectl -n kube-system exec -ti -c centos heapster-107544395-wzx51 bash
[root@heapster-107544395-wzx51 /]# ps -ef
UID        PID  PPID  C STIME TTY          TIME CMD
root         1     0  0 14:24 ?        00:00:00 /pause
65534        5     0  0 14:24 ?        00:00:00 /heapster --source=kubernetes:https://kubernetes.default --sink=influxdb:http://monitoring-influxdb.kube-system.svc:8086
root        13     0  0 14:24 ?        00:00:00 sleep infinity
root        20     0  0 14:26 ?        00:00:00 bash
root        24     0  0 14:26 ?        00:00:00 bash
root        38    24  0 14:26 ?        00:00:00 ps -ef

K8s API via IPv4 OK

[root@heapster-107544395-wzx51 /]# TOKEN=$(cat /var/run/secrets/kubernetes.io/serviceaccount/token)
[root@heapster-107544395-wzx51 /]# curl -sS -k -H "Authorization: Bearer ${TOKEN}"  https://kubernetes.default/api/v1/nodes?resourceVersion=0 |head
{
  "kind": "NodeList",
  "apiVersion": "v1",
  "metadata": {
    "selfLink": "/api/v1/nodes",
...

IPv6 disabled

As expected

[root@heapster-107544395-wzx51 /]# ping6 kubernetes.default
ping: kubernetes.default: Name or service not known

DNS OK

[root@heapster-107544395-wzx51 /]# yum install -y net-tools bind-utils
[root@heapster-107544395-wzx51 /]# nslookup kubernetes.default
Server:     10.96.0.10
Address:    10.96.0.10#53

Non-authoritative answer:
Name:   kubernetes.default.svc.cluster.local
Address: 10.96.0.1

resolv.conf

[root@heapster-107544395-wzx51 /]# cat /etc/resolv.conf 
nameserver 10.96.0.10
search kube-system.svc.cluster.local svc.cluster.local cluster.local novalocal int.na.intgdc.com na.intgdc.com
options ndots:5

andrey01 commented 7 years ago

To continue

tcpdump

Apparently heapster is constantly trying to use the IPv6, despite the fact it is not working. I think hepaster should try then IPv4 instead, unless something is broken within the hepaster container inside.

@andyxning is there a way to see what /etc/resolv.conf uses the heapster container itself? I've noticed that it is pretty slim and does not have sh / bash / cat tools inside. Edit okay I found it, see below in this comment. :-)

[root@heapster-107544395-wzx51 /]# yum install -y iproute tcpdump
[root@heapster-107544395-wzx51 /]# tcpdump -qenn -i any 
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on any, link-type LINUX_SLL (Linux cooked), capture size 262144 bytes
14:45:35.489651  In 00:00:00:00:00:00 127.0.0.1.51835 > 127.0.0.1.53: UDP, length 36
14:45:35.489672  In 00:00:00:00:00:00 127.0.0.1 > 127.0.0.1: ICMP 127.0.0.1 udp port 53 unreachable, length 72
14:45:35.489788  In 00:00:00:00:00:00 ::1.49641 > ::1.53: UDP, length 36
14:45:35.489800  In 00:00:00:00:00:00 ::1 > ::1: ICMP6, destination unreachable, unreachable port, ::1 udp port 53, length 92
14:45:35.489902  In 00:00:00:00:00:00 127.0.0.1.43764 > 127.0.0.1.53: UDP, length 36
14:45:35.489908  In 00:00:00:00:00:00 127.0.0.1 > 127.0.0.1: ICMP 127.0.0.1 udp port 53 unreachable, length 72
14:45:35.489944  In 00:00:00:00:00:00 127.0.0.1.38169 > 127.0.0.1.53: UDP, length 36
14:45:35.877095  In 00:00:00:00:00:00 127.0.0.1.46479 > 127.0.0.1.53: UDP, length 36
14:45:35.877095  In 00:00:00:00:00:00 127.0.0.1.38827 > 127.0.0.1.53: UDP, length 36
14:45:35.877110  In 00:00:00:00:00:00 127.0.0.1 > 127.0.0.1: ICMP 127.0.0.1 udp port 53 unreachable, length 72
14:45:35.877110  In 00:00:00:00:00:00 127.0.0.1 > 127.0.0.1: ICMP 127.0.0.1 udp port 53 unreachable, length 72

Update 1

IPv6 isn't apparently fully disabled, from the container's PoV:

[root@heapster-107544395-wzx51 /]# ping6 ::1
PING ::1(::1) 56 data bytes
64 bytes from ::1: icmp_seq=1 ttl=64 time=0.034 ms
64 bytes from ::1: icmp_seq=2 ttl=64 time=0.033 ms
^C
--- ::1 ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 999ms
rtt min/avg/max/mdev = 0.033/0.033/0.034/0.005 ms
[root@heapster-107544395-wzx51 /]# ping6 localhost
PING localhost(localhost (::1)) 56 data bytes
64 bytes from localhost (::1): icmp_seq=1 ttl=64 time=0.032 ms
64 bytes from localhost (::1): icmp_seq=2 ttl=64 time=0.459 ms
^C
--- localhost ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 1000ms
rtt min/avg/max/mdev = 0.032/0.245/0.459/0.214 ms

Whilst on the host it is:

due to a Host's /etc/hosts does not have the following line set: ::1 localhost ip6-localhost ip6-loopback

Edit but having it there and restarting a dockerd, recreating the heapster container, did not help.

[root@myserver1:~] ping6 localhost
ping: localhost: Name or service not known

This is due net.ipv6.conf.all.disable_ipv6=1:

[root@myserver1:~] ping6 ::1
PING ::1(::1) 56 data bytes
ping: sendmsg: Network is unreachable
ping: sendmsg: Network is unreachable
^C

Update 2

Okay here is the /etc/resolv.conf from the heapster container

[root@myserver1:~] docker inspect k8s_centos_heapster-107544395-wzx51_kube-system_976900bf-9a21-11e7-bc9b-fa163e533806_0 --format '{{.ResolvConfPath}}'
/mnt/docker/containers/011aab52b74600416ac3cfba43e33c372b8c00b1e42a6f1ae892603cd5ce8f5c/resolv.conf

[root@myserver1:~] cat /mnt/docker/containers/011aab52b74600416ac3cfba43e33c372b8c00b1e42a6f1ae892603cd5ce8f5c/resolv.conf
nameserver 10.96.0.10
search kube-system.svc.cluster.local svc.cluster.local cluster.local novalocal int.na.ourdomainXYZ.com na.ourdomainXYZ.com
options ndots:5

And the /etc/hosts

[root@myserver1:~] cat /mnt/docker/containers/011aab52b74600416ac3cfba43e33c372b8c00b1e42a6f1ae892603cd5ce8f5c/hosts
127.0.0.1   localhost
::1 localhost ip6-localhost ip6-loopback
fe00::0 ip6-localnet
ff00::0 ip6-mcastprefix
ff02::1 ip6-allnodes
ff02::2 ip6-allrouters

just to double-ensure we are checking the right path

[root@myserver1:~] cat /mnt/docker/containers/011aab52b74600416ac3cfba43e33c372b8c00b1e42a6f1ae892603cd5ce8f5c/hostname
heapster-107544395-wzx51

Update 3

tcpdump'ing on the host

The failed IPv6 attempts ::1 > ::1: ICMP6, destination unreachable, unreachable port, ::1 udp port 53 seen inside a heapster POD (from a CentOS container), are never reaching outside to the host. So they are stopped within the heapster POD without going anywhere outside.

Update 4

Tried stracing the hepaster, according to what it shows, it appears it is trying to ask a local DNS over a 53/UDP, first IPv4 (AF_INET) 127.0.0.1, then IPv6 (AF_INET6) ::1 every second https://gist.github.com/andrey01/936a5577ecf9aaa8af4a66570cc8f5cc

And all attempts fail.

tcpdump with more details -AA shows that it is asking for kubernetes.default

[root@heapster-107544395-wzx51 /]# tcpdump -qenn -i any -AA
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on any, link-type LINUX_SLL (Linux cooked), capture size 262144 bytes
15:23:25.505865  In 00:00:00:00:00:00 127.0.0.1.51113 > 127.0.0.1.53: UDP, length 36
................E..@".@.@..............5.,.?M...........
kubernetes.default.....................
15:23:25.505877  In 00:00:00:00:00:00 127.0.0.1 > 127.0.0.1: ICMP 127.0.0.1 udp port 53 unreachable, length 72
................E..\>...@.=P...........\....E..@".@.@..............5.,.?M...........
kubernetes.default.....................
15:23:25.505964  In 00:00:00:00:00:00 ::1.59300 > ::1.53: UDP, length 36
................`....,.@...................................5.,.?............
kubernetes.default.....................
15:23:25.505975  In 00:00:00:00:00:00 ::1 > ::1: ICMP6, destination unreachable, unreachable port, ::1 udp port 53, length 92
................`....\:@...................................Y....`....,.@...................................5.,.?............
kubernetes.default.....................
15:23:25.506049  In 00:00:00:00:00:00 127.0.0.1.49516 > 127.0.0.1.53: UDP, length 36
................E..@".@.@............l.5.,.?m...........
kubernetes.default.....................

andrey01 commented 7 years ago

We are running a dockerd with --iptables=false, I have been also trying to set it to true, restarting the dockerd but that did not do any difference. Neither have helped having the ::1 localhost ip6-localhost ip6-loopback in /etc/hosts on the main host.

andyxning commented 7 years ago

@andrey01 IIUC, when query an dns, client should not query 127.0.0.1:53, it should consult nameservers directly. Seems something is wrong.

andyxning commented 7 years ago

@andrey01 What is the result of nslookup monitoring-influxdb.kube-system.svc inside the heapster pod?

What is the result of nslookup monitoring-influxdb.kube-system.svc.cluster.local. inside the heapster pod with the fully qualified domain name(special note for the suffix dot after local)?

Seems that it can not do a dns query for monitoring-influxdb.kube-system.svc.

andyxning commented 7 years ago

@andrey01

is there a way to see what /etc/resolv.conf uses the heapster container itself?

I am afraid that you can only docker inspect heapster-POD_NON_POD_INFRA_CONTAINER and search for ResolvConfPath mapping to get the content for resolv.conf.

andrey01 commented 7 years ago

@andyxning here you go:

nslookup

[root@heapster-487402730-frx1w /]# nslookup monitoring-influxdb.kube-system.svc
Server:     10.96.0.10
Address:    10.96.0.10#53

Name:   monitoring-influxdb.kube-system.svc.cluster.local
Address: 10.102.222.93

[root@heapster-487402730-frx1w /]# nslookup monitoring-influxdb.kube-system.svc.cluster.local.
Server:     10.96.0.10
Address:    10.96.0.10#53

Non-authoritative answer:
Name:   monitoring-influxdb.kube-system.svc.cluster.local
Address: 10.102.222.93

resolv.conf

heapster POD

[root@myserver1:~] docker inspect k8s_heapster_heapster-487402730-frx1w_kube-system_1ad7d84a-9bc3-11e7-b167-fa163e533806_0 |grep -i resolv
        "ResolvConfPath": "/mnt/docker/containers/b66a55cc326bc07635fb71dde2138f3eddc95d672bd0c037b6b74926d5804709/resolv.conf",

[root@myserver1:~] cat /mnt/docker/containers/b66a55cc326bc07635fb71dde2138f3eddc95d672bd0c037b6b74926d5804709/resolv.conf
nameserver 10.96.0.10
search kube-system.svc.cluster.local svc.cluster.local cluster.local novalocal int.na.mydomainXYZ.com na.mydomainXYZ.com
options ndots:5

from centos container which is part of heapster POD

[root@myserver1:~] kubectl -n kube-system exec -ti -c centos heapster-487402730-frx1w bash
[root@heapster-487402730-frx1w /]# ps -ef
UID        PID  PPID  C STIME TTY          TIME CMD
root         1     0  0 16:13 ?        00:00:00 /pause
65534        7     0  0 16:13 ?        00:00:00 /heapster --source=kubernetes:https://kubernetes.default --sink=influxdb:http://monitoring-influxdb.kube-system.svc:8086
root        15     0  0 16:13 ?        00:00:00 sleep infinity
root        21     0  0 16:14 ?        00:00:00 bash
root        37    21  0 16:14 ?        00:00:00 ps -ef
[root@heapster-487402730-frx1w /]# cat /etc/resolv.conf 
nameserver 10.96.0.10
search kube-system.svc.cluster.local svc.cluster.local cluster.local novalocal int.na.mydomainXYZ.com na.mydomainXYZ.com
options ndots:5

andrey01 commented 7 years ago

I think the problem is that the heapster is trying to perform a DNS query over 127.0.0.1 or ::1 instead of reading the nameserver 10.96.0.10 from the /etc/resolv.conf. tcpdump confirms this from both, the container network and the VM host network.

Update 1

I can see there were many changes starting with the heapster 1.3.0, not sure whether this could be related, probably not.. nevertheless, just to mention, ResolvConfDefault = "/etc/resolv.conf" was deleted in https://github.com/kubernetes/heapster/pull/1537/files#diff-c2a7e6ffb19bafecc85ea9765043d927L21

Update 2

@andyxning

IIUC, when query an dns, client should not query 127.0.0.1:53, it should consult nameservers directly. Seems something is wrong.

Yes, absolutely. So far we have experienced this issue only with the heapster starting from v1.3.0 and higher.

via /etc/resolv.conf

It normally queries the nameserver found in /etc/resolv.conf as expected - 10.96.0.10:53 (kube-dns).

[root@heapster-487402730-frx1w /]# dig A monitoring-influxdb.kube-system.svc.cluster.local. | grep -EA1 'ANSWER SECTION|SERVER'
;; ANSWER SECTION:
monitoring-influxdb.kube-system.svc.cluster.local. 3 IN A 10.102.222.93
--
;; SERVER: 10.96.0.10#53(10.96.0.10)
;; WHEN: Sun Sep 17 16:38:49 UTC 2017

via 127.0.0.1:53

Here it fails as supposed to.

[root@heapster-487402730-frx1w /]# dig A monitoring-influxdb.kube-system.svc.cluster.local. @127.0.0.1 | grep -EA1 'ANSWER SECTION|SERVER'
[root@heapster-487402730-frx1w /]# echo $?
1

The question is why does the heapster uses the 127.0.0.1 instead of the nameserver.

andrey01 commented 7 years ago

Okay, moving the heapster binary to the other container-base image such as CentOS 7 seem to have worked around the issue.

Dockerfile:

FROM centos:7

COPY heapster-1.3.0 /heapster

docker build -t local/heapster-amd64:v1.3.0-centos .

[root@myserver1:~] kubectl -n kube-system logs -c heapster -f heapster-3709176945-v26zc   
I0917 16:52:03.751478       5 heapster.go:72] /heapster --source=kubernetes:https://kubernetes.default --sink=influxdb:http://monitoring-influxdb.kube-system.svc:8086
I0917 16:52:03.751672       5 heapster.go:73] Heapster version v1.3.0
I0917 16:52:03.752099       5 configs.go:61] Using Kubernetes client with master "https://kubernetes.default" and version v1
I0917 16:52:03.752136       5 configs.go:62] Using kubelet port 10255
I0917 16:52:03.810163       5 influxdb.go:252] created influxdb sink with options: host:monitoring-influxdb.kube-system.svc:8086 user:root db:k8s
I0917 16:52:03.810186       5 heapster.go:196] Starting with InfluxDB Sink
I0917 16:52:03.810190       5 heapster.go:196] Starting with Metric Sink
I0917 16:52:03.833103       5 heapster.go:106] Starting heapster on port 8082

@andyxning how is the heapster docker image getting created?

Update 1

Yet another workaround was to start heapster with UID 0 (root):

    spec:
      securityContext:
        runAsUser: 0

Then everything is working. I am starting to think that heapster is having some issues with accessing the /etc/resolv.conf file, running from the heapster 1.3.0+ docker images. I have checked the inner parts of the images, they look pretty simple. The major difference is that starting with heapster 1.3.0+ the directive USER 65534:65534 is used.

Here are the permissions down to the /etc/resolv.conf:

[root@myserver1:~] ls -lad /mnt/docker/containers/0496e74cb50f6b508917641a0e6f88020fb5c5a7b974531b02ff39c4d3bf9c16/resolv.conf /mnt/docker/containers/0496e74cb50f6b508917641a0e6f88020fb5c5a7b974531b02ff39c4d3bf9c16 /mnt/docker/containers /mnt/docker /mnt /
dr-xr-xr-x. 18 root root  4096 Sep 15 14:50 /
drwxr-xr-x.  7 root root  4096 Sep 15 14:46 /mnt
drwx--x--x. 11 root root  4096 Sep 15 14:48 /mnt/docker
drwxr-x---+ 86 root root 24576 Sep 17 19:09 /mnt/docker/containers
drwxr-x---+  4 root root  4096 Sep 17 19:09 /mnt/docker/containers/0496e74cb50f6b508917641a0e6f88020fb5c5a7b974531b02ff39c4d3bf9c16
-rw-r--r--+  1 root root   149 Sep 17 19:09 /mnt/docker/containers/0496e74cb50f6b508917641a0e6f88020fb5c5a7b974531b02ff39c4d3bf9c16/resolv.conf

Update 2

So, when I am setting the USER 65534 in the Dockerfile (to simulate the difference observed in heapster 1.3.0+ docker images), the /etc/resolv.conf becomes unreadable for that user.

[root@myserver1:~] docker run --rm -ti --entrypoint bash local/heapster-amd64:v1.3.0-centos
bash-4.2$ id
uid=65534 gid=0(root) groups=0(root)
bash-4.2$ ping google.com
ping: google.com: Name or service not known
bash-4.2$ cat /etc/resolv.conf
cat: /etc/resolv.conf: Permission denied
bash-4.2$ ls -lad / /etc /etc/resolv.conf 
drwxr-xr-x. 1 root root 4096 Sep 17 18:03 /
drwxr-xr-x. 1 root root 4096 Sep 17 18:03 /etc
-rw-r--r--+ 1 root root  179 Sep 17 18:03 /etc/resolv.conf

Interestingly, the same is not observed when setting the user to, say 99. EDIT the reason why it was not observed with 99 because 99 uid had a record in the image's /etc/passwd file, for more details please refer to https://github.com/kubernetes/heapster/issues/1658#issuecomment-331196813 or https://github.com/kubernetes/heapster/pull/1817

[root@myserver1:~] grep -Ew "12345|65534|99" /etc/passwd
nobody:x:99:99:Nobody:/:/sbin/nologin
nfsnobody:x:65534:65534:Anonymous NFS User:/var/lib/nfs:/sbin/nologin

[root@myserver1:~] docker run --rm -ti -u 65534 --entrypoint bash local/heapster-amd64:v1.3.0-centos -c "cat /etc/resolv.conf"
cat: /etc/resolv.conf: Permission denied

[root@myserver1:~] docker run --rm -ti -u 12345 --entrypoint bash local/heapster-amd64:v1.3.0-centos -c "cat /etc/resolv.conf"
cat: /etc/resolv.conf: Permission denied

[root@myserver1:~] docker run --rm -ti -u 99 --entrypoint bash local/heapster-amd64:v1.3.0-centos -c "cat /etc/resolv.conf"
; generated by /usr/sbin/dhclient-script
search [...redacted]
nameserver [...redacted]

andrey01 commented 7 years ago

Looks like the issue is local and not heapster related.

] docker run --rm -ti -u 65534 --entrypoint bash centos:7 -c "cat /etc/resolv.conf"
cat: /etc/resolv.conf: Permission denied

The thing is that heapster is running under 65534 UID since v1.3.0+ which has triggered this issue.

I will try to figure it out and will close this issue.

andrey01 commented 7 years ago

Yes, this was our local issue due to a over-complicated ACLs ...

To those who will wonder, when setting the Default ACLs (-d with setfacl), do not forget to explicitly set the root, e.g.:

Incorrect

/usr/bin/setfacl -Rdm group:splunk:rx,other:r /mnt/docker/containers

Correct

/usr/bin/setfacl -Rdm group:root:r,group:splunk:rx,other:r /mnt/docker/containers

Edit 1

So we must ensure group:root:r is additionally set when setting the Default ACL's (setfacl command with the -d flag). This brings the default Docker behavior, when a docker container process is running under a non-root user is able to read the /etc/resolv.conf file. Interestingly, the user with UID 99 (nobody, in CentOS 7.3) does not experience the same issue.

andyxning commented 7 years ago

@andrey01 IIUC, you mean that the root reason for heapster to perform a DNS query against 127.0.0.1 is that heapster can not read the /etc/resolv.conf file?

andyxning commented 7 years ago

@andrey01 The docker file used to build heapster is at here.

FROM scratch

COPY heapster eventer /
COPY ca-certificates.crt /etc/ssl/certs/

#   nobody:nobody
USER 65534:65534
ENTRYPOINT ["/heapster"]

Actually, according to the settings comment, it is expected that 65534 is the UID of nobody. Maybe this is not true across all linux distributions.

andrey01 commented 7 years ago

IIUC, you mean that the root reason for heapster to perform a DNS query against 127.0.0.1 is that heapster can not read the /etc/resolv.conf file?

... it is expected that 65534 is the UID of nobody. Maybe this is not true across all linux distributions.

@andyxning you are right:

Ubuntu xenial

$ id nobody
uid=65534(nobody) gid=65534(nogroup) groups=65534(nogroup)

CentOS 7

# id nobody
uid=99(nobody) gid=99(nobody) groups=99(nobody)

# id nfsnobody
uid=65534(nfsnobody) gid=65534(nfsnobody) groups=65534(nfsnobody)

But the real issue lies somewhere else... where /etc/resolv.conf is unreadable to others. I have tried to completely remove /mnt/docker/containers directory, restarting the dockerd, applying our old ACL setting which has other:r, and I cannot reproduce this issue since /etc/resolv.conf is readable with 12345:12345 (non-existing user/group) and with 65534:65534 (nfsnobody:nfsnobody in CentOS 7).

Edit 1

I will try to put back our old deployment, in order to reproduce it one more time.

andrey01 commented 7 years ago

Ok, I am able to reproduce the issue:

setfacl --recursive --remove-all -- /mnt/docker/containers
setfacl -Rdm group:splunk:rx,other:r -- /mnt/docker/containers
setfacl -Rm group:splunk:rx -- /mnt/docker/containers

After that, no matter if one restarts the dockerd or reloads the overlay FS, the issue occurs again:

~] kubectl -n kube-system logs -f heapster-1428236806-xs2kj
I0918 07:54:11.813694       5 heapster.go:72] /heapster --source=kubernetes:https://kubernetes.default --sink=influxdb:http://monitoring-influxdb.kube-system.svc:8086
I0918 07:54:11.813786       5 heapster.go:73] Heapster version v1.4.0
I0918 07:54:11.814064       5 configs.go:61] Using Kubernetes client with master "https://kubernetes.default" and version v1
I0918 07:54:11.814079       5 configs.go:62] Using kubelet port 10255
E0918 07:54:11.816125       5 kubelet.go:334] Failed to load nodes: Get https://kubernetes.default/api/v1/nodes: dial tcp: lookup kubernetes.default on [::1]:53: read udp [::1]:51759->[::1]:53: read: connection refused
E0918 07:54:11.817474       5 influxdb.go:264] issues while creating an InfluxDB sink: failed to ping InfluxDB server at "monitoring-influxdb.kube-system.svc:8086" - Get http://monitoring-influxdb.kube-system.svc:8086/ping: dial tcp: lookup monitoring-influxdb.kube-system.svc on [::1]:53: read udp [::1]:49628->[::1]:53: read: connection refused, will retry on use
I0918 07:54:11.817486       5 influxdb.go:278] created influxdb sink with options: host:monitoring-influxdb.kube-system.svc:8086 user:root db:k8s
I0918 07:54:11.817540       5 heapster.go:196] Starting with InfluxDB Sink
I0918 07:54:11.817549       5 heapster.go:196] Starting with Metric Sink
E0918 07:54:11.819666       5 reflector.go:190] k8s.io/heapster/metrics/heapster.go:322: Failed to list *v1.Pod: Get https://kubernetes.default/api/v1/pods?resourceVersion=0: dial tcp: lookup kubernetes.default on [::1]:53: read udp [::1]:54248->[::1]:53: read: connection refused
E0918 07:54:11.819755       5 reflector.go:190] k8s.io/heapster/metrics/util/util.go:51: Failed to list *v1.Node: Get https://kubernetes.default/api/v1/nodes?resourceVersion=0: dial tcp: lookup kubernetes.default on [::1]:53: read udp [::1]:54248->[::1]:53: read: connection refused
E0918 07:54:11.819839       5 reflector.go:190] k8s.io/heapster/metrics/util/util.go:51: Failed to list *v1.Node: Get https://kubernetes.default/api/v1/nodes?resourceVersion=0: dial tcp: lookup kubernetes.default on [::1]:53: read udp [::1]:54248->[::1]:53: read: connection refused
E0918 07:54:11.821335       5 reflector.go:190] k8s.io/heapster/metrics/processors/namespace_based_enricher.go:84: Failed to list *v1.Namespace: Get https://kubernetes.default/api/v1/namespaces?resourceVersion=0: dial tcp: lookup kubernetes.default on [::1]:53: read udp [::1]:48313->[::1]:53: read: connection refused
E0918 07:54:11.823103       5 reflector.go:190] k8s.io/heapster/metrics/util/util.go:51: Failed to list *v1.Node: Get https://kubernetes.default/api/v1/nodes?resourceVersion=0: dial tcp: lookup kubernetes.default on [::1]:53: read udp [::1]:51373->[::1]:53: read: connection refused
I0918 07:54:11.909308       5 heapster.go:106] Starting heapster on port 8082
E0918 07:54:12.820936       5 reflector.go:190] k8s.io/heapster/metrics/heapster.go:322: Failed to list *v1.Pod: Get https://kubernetes.default/api/v1/pods?resourceVersion=0: dial tcp: lookup kubernetes.default on [::1]:53: read udp [::1]:56828->[::1]:53: read: connection refused
E0918 07:54:12.824619       5 reflector.go:190] k8s.io/heapster/metrics/util/util.go:51: Failed to list *v1.Node: Get https://kubernetes.default/api/v1/nodes?resourceVersion=0: dial tcp: lookup kubernetes.default on [::1]:53: read udp [::1]:57378->[::1]:53: read: connection refused
E0918 07:54:12.824669       5 reflector.go:190] k8s.io/heapster/metrics/util/util.go:51: Failed to list *v1.Node: Get https://kubernetes.default/api/v1/nodes?resourceVersion=0: dial tcp: lookup kubernetes.default on [::1]:53: read udp [::1]:57378->[::1]:53: read: connection refused
E0918 07:54:12.904243       5 reflector.go:190] k8s.io/heapster/metrics/processors/namespace_based_enricher.go:84: Failed to list *v1.Namespace: Get https://kubernetes.default/api/v1/namespaces?resourceVersion=0: dial tcp: lookup kubernetes.default on [::1]:53: read udp [::1]:42402->[::1]:53: read: connection refused

As a remedy, one would need to explicitly set the group:root:r for the 2nd setfacl command:

setfacl --recursive --remove-all -- /mnt/docker/containers
setfacl -Rdm group:root:r,group:splunk:rx,other:r -- /mnt/docker/containers
setfacl -Rm group:splunk:rx -- /mnt/docker/containers

Strangely, this affects only the heapster 1.3.0+ containers, nevertheless I was removing the image right before removing the heapster deployment, restarting the dockerd, reloading the overlay FS.

Running some other docker image, e.g. centos:7 or alpine:3.6 did not indicate this problem at all:

centos7] docker run --rm -ti -u 65534:65534 --entrypoint bash centos:7 -c "id; ls -la /etc/resolv.conf /etc/shells /etc/libaudit.conf; wc -l /etc/resolv.conf /etc/shells /etc/libaudit.conf;"
uid=65534 gid=65534 groups=65534
-rw-r-----. 1 root root 191 Apr 19 12:56 /etc/libaudit.conf
-rw-r--r--+ 1 root root 179 Sep 18 07:57 /etc/resolv.conf
-rw-r--r--. 1 root root  76 Jun  7  2013 /etc/shells
  6 /etc/resolv.conf     <<<<< OK, readable as expected
  6 /etc/shells
wc: /etc/libaudit.conf: Permission denied   <<<<<< OK as expected
 12 total

Update 1

I have checked the differences between the working and non-working configuration for contents permissions under overlay GraphDriver LowerDir,MergedDir,UpperDir,WorkDir and the ResolvConfPath (its upper level contents) and have found one single difference which was under the latter one (top-level for ResolvConfPath):

(not working) without group:root:r in the beforementioned ACLs:

4196597    8 drwx------   4 root     root         4096 Sep 18 10:16 /mnt/docker/containers/9494f5755bc1045b4f98c69dc05ae16d6a1c162299d1e40e4c246079b3ab9e1b/

(working) with group:root:r:

4196505    8 drwxr-x---   4 root     root         4096 Sep 18 10:11 /mnt/docker/containers/d8f29a5c7efdea458ac988d1e99e0976b97e760cc473927f6e1d23c44755ab0c/

I still cannot get why the issue occurs only to the heapster 1.3.0+ container, but not to centos:7 or alpine:3.6 when running with -u 65534:65534 (or any other random UID/GID). I have even tried to build a new image based on a simple Dockerfile, with a USER 65534:65534 set, though have not tried with the FROM scratch image yet.

andrey01 commented 7 years ago

@andyxning okay, I think I have got really close to the actual issue.

It appears that GID is not actually 65534 (nobody), but is 0 (root).

This is exactly why our Default ACL "fix" works while what it does, is explicitly allowing the group:root:r.

So as I have finally managed to access the heapster container and see what is going on from inside.

accessing the scratch images

The images built FROM scratch can be pain to debug, so here are steps I have prepared to ease the pain:

wget https://busybox.net/downloads/binaries/1.27.1-i686/busybox
chmod +x busybox
cp busybox /proc/$(pidof heapster)/cwd/busybox
mkdir -p /proc/$(pidof heapster)/cwd/{bin,sbin,usr/bin,usr/sbin}
nsenter --mount=/proc/$(pidof heapster)/task/$(pidof heapster)/ns/mnt -- /busybox --install

] kubectl -n kube-system exec -ti heapster-1428236806-lt5wd sh
/ $ id
uid=65534 gid=0
/ $ ps -ef
PID   USER     TIME   COMMAND
    1 0          0:00 /pause
    5 65534      0:09 /heapster --source=kubernetes:https://kubernetes.default --sink=influxdb:http://monitoring-influxdb.kube-system.svc:8086
   29 65534      0:00 /busybox sh
   47 65534      0:00 sh
   52 65534      0:00 ps -ef
/ $ stat / /etc /etc/resolv.conf |grep -E 'File:|^Access: \('
  File: /
Access: (0755/drwxr-xr-x)  Uid: (    0/ UNKNOWN)   Gid: (    0/ UNKNOWN)
  File: /etc
Access: (0755/drwxr-xr-x)  Uid: (    0/ UNKNOWN)   Gid: (    0/ UNKNOWN)
  File: /etc/resolv.conf
Access: (0654/-rw-r-xr--)  Uid: (    0/ UNKNOWN)   Gid: (    0/ UNKNOWN)
/ $ id
uid=65534 gid=0
/ $ ls -lad / /etc /etc/resolv.conf
drwxr-xr-x    1 0        0             4096 Sep 18 09:20 /
drwxr-xr-x    1 0        0             4096 Sep 18 09:19 /etc
-rw-r--r--    1 0        0              149 Sep 18 09:19 /etc/resolv.conf
/ $ cat /etc/resolv.conf
cat: can't open '/etc/resolv.conf': Permission denied

So you can see that the root group and others can read the /, /etc, /etc/resolv.conf, but it cannot really read the resolv.conf.

So our Default ACL (if without a "fix" group:root:r) are not permitting the root to read the resolv.conf.

before

[root@myserver1:~] kubectl -n kube-system exec -ti heapster-1428236806-p59ff -- sh -c "cat /etc/resolv.conf"
cat: can't open '/etc/resolv.conf': Permission denied

[root@myserver1:~] getfacl /mnt/docker/containers/0273110ec9b94f1d7ece40be481b67170d9739e21aa9be7f0e4ccdb9d8ff4928/resolv.conf
getfacl: Removing leading '/' from absolute path names
# file: mnt/docker/containers/0273110ec9b94f1d7ece40be481b67170d9739e21aa9be7f0e4ccdb9d8ff4928/resolv.conf
# owner: root
# group: root
user::rw-
group::---
group:splunk:r-x
mask::r-x
other::r--

after

[root@myserver1:~] setfacl -m group:root:r,group:splunk:rx -- /mnt/docker/containers/0273110ec9b94f1d7ece40be481b67170d9739e21aa9be7f0e4ccdb9d8ff4928/resolv.conf

[root@myserver1:~] kubectl -n kube-system exec -ti heapster-1428236806-p59ff -- sh -c "cat /etc/resolv.conf"
nameserver 10.96.0.10
search kube-system.svc.cluster.local svc.cluster.local cluster.local novalocal int.na.mydomainXYZ.com na.mydomainXYZ.com
options ndots:5

[root@myserver1:~] getfacl /mnt/docker/containers/0273110ec9b94f1d7ece40be481b67170d9739e21aa9be7f0e4ccdb9d8ff4928/resolv.conf
getfacl: Removing leading '/' from absolute path names
# file: mnt/docker/containers/0273110ec9b94f1d7ece40be481b67170d9739e21aa9be7f0e4ccdb9d8ff4928/resolv.conf
# owner: root
# group: root
user::rw-
group::---
group:root:r--
group:splunk:r-x
mask::r-x
other::r--

The real question now is why are we getting gid=0, whilst it should be 65534 as per Dockerfile. Maybe this is some Kuberntes specific? https://github.com/kubernetes/kubernetes/issues/22179

andyxning commented 7 years ago

@andrey01 According to the user Dockerfile doc.

Warning: When the user does doesn’t have a primary group then the image (or the next instructions) will be run with the root group.

When there are no primary group then the heapster command will be run as root group. You need to check the primary group for the nobody user.

arno01 commented 7 years ago

@andyxning that is interesting, I have created the PR https://github.com/kubernetes/heapster/pull/1817 which intends to bring the default Docker behavior in order to preserve the relevant GID set to 65534 (nobody) since by default Kubernetes would not respect the image's USER uid:gid set in the Dockerfile. Not sure whether there is an opened issue/pr for the Kubernetes to address this.

Prepare a Dockerfile with the user 65534:65534 (nobody:nobody)

FROM scratch
# wget https://busybox.net/downloads/binaries/1.27.1-i686/busybox
# chmod +x busybox
COPY busybox /busybox
USER 65534:65534
ENTRYPOINT ["/busybox"]
CMD ["id"]

Build the image

docker build -t scratch .

Running the image shows uid:gid as expected - 65534:65534.

docker run --rm -ti scratch
uid=65534 gid=65534

When uid is explicitly set to 65534, then gid is not getting picked, because scratch image does not have /etc/passwd file.

docker run --rm -u 65534 -ti scratch
uid=65534 gid=0

docker run --rm -u nobody  -ti scratch
docker: Error response from daemon: linux spec user: unable to find user nobody: no matching entries in passwd file.

docker run --rm -ti scratch cat /etc/passwd
cat: can't open '/etc/passwd': No such file or directory

As a remedy, let's add the /etc/passwd file to the scratch image

echo 'nobody:x:65534:65534:nobody:/home:/bin/false' > passwd

Dockerfile:

FROM scratch
# wget https://busybox.net/downloads/binaries/1.27.1-i686/busybox
# chmod +x busybox
COPY busybox /busybox
COPY passwd /etc/passwd
USER 65534:65534
ENTRYPOINT ["/busybox"]
CMD ["id"]

The gid is getting picked now:

docker run --rm -ti -u nobody scratch 
uid=65534(nobody) gid=65534

docker run --rm -ti -u 65534 scratch 
uid=65534(nobody) gid=65534

Running in Kubernetes

building and pushing the images

docker build -t andrey01/scratch:passwd .
docker build -t andrey01/scratch:nopasswd .
docker push andrey01/scratch:passwd
docker push andrey01/scratch:nopasswd

Preparing the jobs

k8s-job-scratch-passwd.yml:

kind: Job
apiVersion: batch/v1
metadata:
  name: scratch-passwd
spec:
  template:
    metadata:
      labels:
        app: scratch-passwd
    spec:
      restartPolicy: OnFailure
      containers:
      - name: scratch-passwd
        image: andrey01/scratch:passwd
        imagePullPolicy: Always

k8s-job-scratch-nopasswd.yml:

kind: Job
apiVersion: batch/v1
metadata:
  name: scratch-nopasswd
spec:
  template:
    metadata:
      labels:
        app: scratch-nopasswd
    spec:
      restartPolicy: OnFailure
      containers:
      - name: scratch-nopasswd
        image: andrey01/scratch:nopasswd
        imagePullPolicy: Always

Running the jobs

kubectl create -f k8s-job-scratch-nopasswd.yml 
kubectl create -f k8s-job-scratch-passwd.yml

Checking the output

$ kubectl get jobs
NAME               DESIRED   SUCCESSFUL   AGE
scratch-nopasswd   1         1            19s
scratch-passwd     1         1            16s

$ kubectl logs $(kubectl get pods --show-all -l app=scratch-passwd --output jsonpath='{.items[0].metadata.name}')
uid=65534(nobody) gid=65534

$ kubectl logs $(kubectl get pods --show-all -l app=scratch-nopasswd --output jsonpath='{.items[0].metadata.name}')
uid=65534 gid=0

As you can see gid is 0 for the scratch container which does not have /etc/passwd file containing the relevant uid:gid pair.

Cleanup

kubectl delete job scratch-nopasswd scratch-passwd