Closed rade closed 7 years ago
@rade slight correction.
Its actually kubernetes v1.5.4 with kops.
root@ip-10-116-75-183:/home/admin# cat /boot/config-* | grep BPF
CONFIG_NETFILTER_XT_MATCH_BPF=m
CONFIG_NET_CLS_BPF=m
CONFIG_BPF_JIT=y
CONFIG_HAVE_BPF_JIT=y
# CONFIG_TEST_BPF is not set
CONFIG_BPF=y
CONFIG_BPF_SYSCALL=y
CONFIG_NETFILTER_XT_MATCH_BPF=m
CONFIG_NET_CLS_BPF=m
CONFIG_NET_ACT_BPF=m
CONFIG_BPF_JIT=y
CONFIG_HAVE_BPF_JIT=y
CONFIG_BPF_EVENTS=y
CONFIG_TEST_BPF=m
I'd check the mounts. On my laptop, I have:
$ mount|grep tracefs
tracefs on /sys/kernel/debug/tracing type tracefs (rw,relatime,seclabel)
$ mount|grep debugfs
debugfs on /sys/kernel/debug type debugfs (rw,relatime,seclabel)
I don't know kops but there was a similar issue on Amazon Linux due to a kernel bug. related discussion: https://forums.aws.amazon.com/thread.jspa?messageID=753257򷹩
The workaround to the kernel bug was to make sure debugfs is mounted before starting the Docker daemon. In this order:
$ sudo mount -t debugfs debugfs /sys/kernel/debug
$ sudo /etc/init.d/docker start
Another thing to check would be if the kernel has kprobes. On my laptop I have:
$ cat /boot/config-`uname -r`|grep KPROBE
CONFIG_KPROBES=y
CONFIG_KPROBES_ON_FTRACE=y
CONFIG_HAVE_KPROBES=y
CONFIG_HAVE_KPROBES_ON_FTRACE=y
CONFIG_KPROBE_EVENTS=y
# CONFIG_KPROBES_SANITY_TEST is not set
@ajohnstone could you run some of the checks alban suggested above?
Hi, in the kubernetes manifests mount point /sys/kernel/debug Is empty
root@ip-10-116-71-191:/home/admin# mount | grep -E 'tracefs|debugfs'
debugfs on /sys/kernel/debug type debugfs (rw,relatime)
tracefs on /sys/kernel/debug/tracing type tracefs (rw,relatime)
root@ip-10-116-71-191:/home/admin# cat /boot/config-`uname -r`|grep KPROBE
CONFIG_KPROBES=y
CONFIG_KPROBES_ON_FTRACE=y
CONFIG_HAVE_KPROBES=y
CONFIG_HAVE_KPROBES_ON_FTRACE=y
CONFIG_KPROBE_EVENT=y
# CONFIG_KPROBES_SANITY_TEST is not set
root@ip-10-116-71-191:/home/admin# docker run -v /sys/kernel/debug:/sys/kernel/debug -it ubuntu ls -alh /sys/kernel/debug
total 0
drwx------ 21 root root 0 Jun 14 11:17 .
drwxr-xr-x 10 root root 0 Aug 2 13:23 ..
drwxr-xr-x 2 root root 0 Jun 14 11:17 acpi
drwxr-xr-x 11 root root 0 Jun 14 11:18 bdi
drwxr-xr-x 2 root root 0 Jun 14 11:18 btrfs
drwxr-xr-x 2 root root 0 Jun 14 11:17 clk
drwxr-xr-x 2 root root 0 Jun 14 11:17 dma_buf
drwxr-xr-x 4 root root 0 Jun 14 11:18 dri
drwxr-xr-x 2 root root 0 Jun 14 11:17 dynamic_debug
drwxr-xr-x 2 root root 0 Jun 14 11:17 extfrag
-rw-r--r-- 1 root root 0 Jun 14 11:17 fault_around_bytes
drwxr-xr-x 2 root root 0 Jun 14 11:17 frontswap
-r--r--r-- 1 root root 0 Jun 14 11:17 gpio
drwxr-xr-x 2 root root 0 Jul 18 21:36 ieee80211
drwxr-xr-x 2 root root 0 Jun 14 11:17 kprobes
drwxr-xr-x 2 root root 0 Jun 14 11:17 mce
drwxr-xr-x 2 root root 0 Jun 14 11:17 pinctrl
drwxr-xr-x 2 root root 0 Jun 14 11:17 pm_qos
drwxr-xr-x 2 root root 0 Jun 14 11:17 ras
drwxr-xr-x 2 root root 0 Jun 14 11:17 regmap
-rw-r--r-- 1 root root 0 Jun 14 11:17 sched_features
-r--r--r-- 1 root root 0 Jun 14 11:17 sleep_time
drwxr-xr-x 5 root root 0 Jun 14 11:18 sunrpc
-r--r--r-- 1 root root 0 Jun 14 11:17 suspend_stats
drwx------ 6 root root 0 Jun 14 11:17 tracing
-r--r--r-- 1 root root 0 Jun 14 11:17 wakeup_sources
drwxr-xr-x 2 root root 0 Jun 14 11:17 x86
drwxr-xr-x 2 root root 0 Jun 14 11:17 zswap
root@ip-10-116-71-191:/home/admin# docker run -it ubuntu ls -alh /sys/kernel/debug/
total 0
dr-xr-xr-x 2 root root 0 Aug 2 13:30 .
drwxr-xr-x 10 root root 0 Aug 2 13:23 ..
@ajohnstone thanks for testing. It seems you have kprobes correctly enabled in the kernel.
Could you also check the content of the tracing directory?
on the host:
ls -l /sys/kernel/debug/tracing/
in a docker container, with the debugfs volume enabled:
docker run --rm -it -v /sys/kernel/debug:/sys/kernel/debug ubuntu ls -alh /sys/kernel/debug/tracing
in the mnt namespace of the docker daemon:
DOCKER_PID=$(pidof dockerd-current) # might also be called simply "dockerd" on kops?
ls -l /proc/$DOCKER_PID/root/sys/kernel/debug/tracing
On my laptop, the tracing
directory is populated (with kprobe_events
etc.) in all 3 places. With the bug I've seen before on Amazon Linux, tracing
would only be populated on the host.
Does it work if you restart the docker daemon and containerd (without restarting the machine)?
I've ran all the above and added a couple more tests below. Tbh I'm not sure what I should be looking for. Should /sys/kernel/debug/ be mounted from the host?
Host
root@ip-10-116-71-191:/home/admin# ls -l /sys/kernel/debug/tracing/
total 0
-r--r--r-- 1 root root 0 Jun 14 11:17 available_events
-r--r--r-- 1 root root 0 Jun 14 11:17 available_filter_functions
-r--r--r-- 1 root root 0 Jun 14 11:17 available_tracers
-rw-r--r-- 1 root root 0 Jun 14 11:17 buffer_size_kb
-r--r--r-- 1 root root 0 Jun 14 11:17 buffer_total_size_kb
-rw-r--r-- 1 root root 0 Jun 14 11:17 current_tracer
-r--r--r-- 1 root root 0 Jun 14 11:17 dyn_ftrace_total_info
-r--r--r-- 1 root root 0 Jun 14 11:17 enabled_functions
drwxr-xr-x 61 root root 0 Jun 14 11:17 events
--w------- 1 root root 0 Jun 14 11:17 free_buffer
drwxr-xr-x 2 root root 0 Jun 14 11:17 instances
-rw-r--r-- 1 root root 0 Jun 14 11:17 kprobe_events
-r--r--r-- 1 root root 0 Jun 14 11:17 kprobe_profile
-rw-r--r-- 1 root root 0 Jun 14 11:17 max_graph_depth
drwxr-xr-x 2 root root 0 Jun 14 11:17 options
drwxr-xr-x 17 root root 0 Jun 14 11:17 per_cpu
-r--r--r-- 1 root root 0 Jun 14 11:17 printk_formats
-r--r--r-- 1 root root 0 Jun 14 11:17 README
-r--r--r-- 1 root root 0 Jun 14 11:17 saved_cmdlines
-rw-r--r-- 1 root root 0 Jun 14 11:17 saved_cmdlines_size
-rw-r--r-- 1 root root 0 Jun 14 11:17 set_event
-rw-r--r-- 1 root root 0 Jun 14 11:17 set_event_pid
-rw-r--r-- 1 root root 0 Jun 14 11:17 set_ftrace_filter
-rw-r--r-- 1 root root 0 Jun 14 11:17 set_ftrace_notrace
-rw-r--r-- 1 root root 0 Jun 14 11:17 set_ftrace_pid
-r--r--r-- 1 root root 0 Jun 14 11:17 set_graph_function
-r--r--r-- 1 root root 0 Jun 14 11:17 set_graph_notrace
-rw-r--r-- 1 root root 0 Jun 14 11:17 snapshot
-rw-r--r-- 1 root root 0 Jun 14 11:17 stack_max_size
-r--r--r-- 1 root root 0 Jun 14 11:17 stack_trace
-r--r--r-- 1 root root 0 Jun 14 11:17 stack_trace_filter
-rw-r--r-- 1 root root 0 Jun 14 11:17 trace
-rw-r--r-- 1 root root 0 Jun 14 11:17 trace_clock
--w--w---- 1 root root 0 Jun 14 11:17 trace_marker
-rw-r--r-- 1 root root 0 Jun 14 11:17 trace_options
-r--r--r-- 1 root root 0 Jun 14 11:17 trace_pipe
-rw-r--r-- 1 root root 0 Jun 14 11:17 tracing_cpumask
-rw-r--r-- 1 root root 0 Jun 14 11:17 tracing_max_latency
-rw-r--r-- 1 root root 0 Jun 14 11:17 tracing_on
-rw-r--r-- 1 root root 0 Jun 14 11:17 tracing_thresh
-rw-r--r-- 1 root root 0 Jun 14 11:17 uprobe_events
-r--r--r-- 1 root root 0 Jun 14 11:17 uprobe_profile
docker container debugfs
root@ip-10-116-71-191:/home/admin# docker run --rm -it -v /sys/kernel/debug:/sys/kernel/debug ubuntu ls -alh /sys/kernel/debug/tracing
total 0
drwx------ 6 root root 0 Jun 14 11:17 .
drwx------ 21 root root 0 Jun 14 11:17 ..
-r--r--r-- 1 root root 0 Jun 14 11:17 README
-r--r--r-- 1 root root 0 Jun 14 11:17 available_events
-r--r--r-- 1 root root 0 Jun 14 11:17 available_filter_functions
-r--r--r-- 1 root root 0 Jun 14 11:17 available_tracers
-rw-r--r-- 1 root root 0 Jun 14 11:17 buffer_size_kb
-r--r--r-- 1 root root 0 Jun 14 11:17 buffer_total_size_kb
-rw-r--r-- 1 root root 0 Jun 14 11:17 current_tracer
-r--r--r-- 1 root root 0 Jun 14 11:17 dyn_ftrace_total_info
-r--r--r-- 1 root root 0 Jun 14 11:17 enabled_functions
drwxr-xr-x 61 root root 0 Jun 14 11:17 events
--w------- 1 root root 0 Jun 14 11:17 free_buffer
drwxr-xr-x 2 root root 0 Jun 14 11:17 instances
-rw-r--r-- 1 root root 0 Jun 14 11:17 kprobe_events
-r--r--r-- 1 root root 0 Jun 14 11:17 kprobe_profile
-rw-r--r-- 1 root root 0 Jun 14 11:17 max_graph_depth
drwxr-xr-x 2 root root 0 Jun 14 11:17 options
drwxr-xr-x 17 root root 0 Jun 14 11:17 per_cpu
-r--r--r-- 1 root root 0 Jun 14 11:17 printk_formats
-r--r--r-- 1 root root 0 Jun 14 11:17 saved_cmdlines
-rw-r--r-- 1 root root 0 Jun 14 11:17 saved_cmdlines_size
-rw-r--r-- 1 root root 0 Jun 14 11:17 set_event
-rw-r--r-- 1 root root 0 Jun 14 11:17 set_event_pid
-rw-r--r-- 1 root root 0 Jun 14 11:17 set_ftrace_filter
-rw-r--r-- 1 root root 0 Jun 14 11:17 set_ftrace_notrace
-rw-r--r-- 1 root root 0 Jun 14 11:17 set_ftrace_pid
-r--r--r-- 1 root root 0 Jun 14 11:17 set_graph_function
-r--r--r-- 1 root root 0 Jun 14 11:17 set_graph_notrace
-rw-r--r-- 1 root root 0 Jun 14 11:17 snapshot
-rw-r--r-- 1 root root 0 Jun 14 11:17 stack_max_size
-r--r--r-- 1 root root 0 Jun 14 11:17 stack_trace
-r--r--r-- 1 root root 0 Jun 14 11:17 stack_trace_filter
-rw-r--r-- 1 root root 0 Jun 14 11:17 trace
-rw-r--r-- 1 root root 0 Jun 14 11:17 trace_clock
--w--w---- 1 root root 0 Jun 14 11:17 trace_marker
-rw-r--r-- 1 root root 0 Jun 14 11:17 trace_options
-r--r--r-- 1 root root 0 Jun 14 11:17 trace_pipe
-rw-r--r-- 1 root root 0 Jun 14 11:17 tracing_cpumask
-rw-r--r-- 1 root root 0 Jun 14 11:17 tracing_max_latency
-rw-r--r-- 1 root root 0 Jun 14 11:17 tracing_on
-rw-r--r-- 1 root root 0 Jun 14 11:17 tracing_thresh
-rw-r--r-- 1 root root 0 Jun 14 11:17 uprobe_events
-r--r--r-- 1 root root 0 Jun 14 11:17 uprobe_profile
mnt namespace of docker daemon
root@ip-10-116-71-191:/home/admin# DOCKER_PID=$(pidof dockerd)
root@ip-10-116-71-191:/home/admin# ls -l /proc/$DOCKER_PID/root/sys/kernel/debug/tracing
total 0
-r--r--r-- 1 root root 0 Jun 14 11:17 available_events
-r--r--r-- 1 root root 0 Jun 14 11:17 available_filter_functions
-r--r--r-- 1 root root 0 Jun 14 11:17 available_tracers
-rw-r--r-- 1 root root 0 Jun 14 11:17 buffer_size_kb
-r--r--r-- 1 root root 0 Jun 14 11:17 buffer_total_size_kb
-rw-r--r-- 1 root root 0 Jun 14 11:17 current_tracer
-r--r--r-- 1 root root 0 Jun 14 11:17 dyn_ftrace_total_info
-r--r--r-- 1 root root 0 Jun 14 11:17 enabled_functions
drwxr-xr-x 61 root root 0 Jun 14 11:17 events
--w------- 1 root root 0 Jun 14 11:17 free_buffer
drwxr-xr-x 2 root root 0 Jun 14 11:17 instances
-rw-r--r-- 1 root root 0 Jun 14 11:17 kprobe_events
-r--r--r-- 1 root root 0 Jun 14 11:17 kprobe_profile
-rw-r--r-- 1 root root 0 Jun 14 11:17 max_graph_depth
drwxr-xr-x 2 root root 0 Jun 14 11:17 options
drwxr-xr-x 17 root root 0 Jun 14 11:17 per_cpu
-r--r--r-- 1 root root 0 Jun 14 11:17 printk_formats
-r--r--r-- 1 root root 0 Jun 14 11:17 README
-r--r--r-- 1 root root 0 Jun 14 11:17 saved_cmdlines
-rw-r--r-- 1 root root 0 Jun 14 11:17 saved_cmdlines_size
-rw-r--r-- 1 root root 0 Jun 14 11:17 set_event
-rw-r--r-- 1 root root 0 Jun 14 11:17 set_event_pid
-rw-r--r-- 1 root root 0 Jun 14 11:17 set_ftrace_filter
-rw-r--r-- 1 root root 0 Jun 14 11:17 set_ftrace_notrace
-rw-r--r-- 1 root root 0 Jun 14 11:17 set_ftrace_pid
-r--r--r-- 1 root root 0 Jun 14 11:17 set_graph_function
-r--r--r-- 1 root root 0 Jun 14 11:17 set_graph_notrace
-rw-r--r-- 1 root root 0 Jun 14 11:17 snapshot
-rw-r--r-- 1 root root 0 Jun 14 11:17 stack_max_size
-r--r--r-- 1 root root 0 Jun 14 11:17 stack_trace
-r--r--r-- 1 root root 0 Jun 14 11:17 stack_trace_filter
-rw-r--r-- 1 root root 0 Jun 14 11:17 trace
-rw-r--r-- 1 root root 0 Jun 14 11:17 trace_clock
--w--w---- 1 root root 0 Jun 14 11:17 trace_marker
-rw-r--r-- 1 root root 0 Jun 14 11:17 trace_options
-r--r--r-- 1 root root 0 Jun 14 11:17 trace_pipe
-rw-r--r-- 1 root root 0 Jun 14 11:17 tracing_cpumask
-rw-r--r-- 1 root root 0 Jun 14 11:17 tracing_max_latency
-rw-r--r-- 1 root root 0 Jun 14 11:17 tracing_on
-rw-r--r-- 1 root root 0 Jun 14 11:17 tracing_thresh
-rw-r--r-- 1 root root 0 Jun 14 11:17 uprobe_events
-r--r--r-- 1 root root 0 Jun 14 11:17 uprobe_profile
root@ip-10-116-71-191:/home/admin# docker run -it ubuntu find /sys/kernel/debug/
/sys/kernel/debug/
root@ip-10-116-71-191:/home/admin# /etc/init.d/docker restart
[ ok ] Restarting docker (via systemctl): docker.service.
root@ip-10-116-71-191:/home/admin# docker run -it ubuntu find /sys/kernel/debug/
/sys/kernel/debug/
POD=$(kubectl get pods --namespace=kube-system -l app=weave-scope -owide | grep 10-116-71-191 | awk '{print $1}')
kubectl --namespace=kube-system logs $POD
time="2017-08-03T22:25:51Z" level=info msg="publishing to: https://cloud.weave.works:443"
<probe> INFO: 2017/08/03 22:25:51.497560 command line args: --mode=probe --no-app=true --probe.docker=true --probe.docker.bridge=docker0 --probe.kubernetes=true --service-token=<elided>
<probe> INFO: 2017/08/03 22:25:51.497602 probe starting, version 1.6.2, ID 2e69ca37862da606
<probe> WARN: 2017/08/03 22:25:51.563259 Error setting up the eBPF tracker, falling back to proc scanning: cannot open kprobe_events: open /sys/kernel/debug/tracing/kprobe_events: no such file or directory
<probe> INFO: 2017/08/03 22:25:51.563507 kubernetes: targeting api server https://100.64.0.1:443
<probe> WARN: 2017/08/03 22:25:51.712225 Error collecting weave status, backing off 10s: Get http://127.0.0.1:6784/report: dial tcp 127.0.0.1:6784: getsockopt: connection refused
<probe> WARN: 2017/08/03 22:25:51.729564 Cannot resolve 'scope.weave.local.': dial tcp 172.17.0.1:53: getsockopt: connection refused
<probe> ERRO: 2017/08/03 22:25:51.757813 conntrack stderr:NOTICE: Netlink socket buffer size has been set to 8388608 bytes.
<probe> WARN: 2017/08/03 22:25:51.775648 Error collecting weave ps, backing off 20s: exit status 1: "Link not found\n"
<probe> INFO: 2017/08/03 22:25:51.849473 Control connection to cloud.weave.works starting
<probe> INFO: 2017/08/03 22:25:51.899318 Publish loop for cloud.weave.works starting
<probe> ERRO: 2017/08/03 22:25:51.923726 conntrack stderr:NOTICE: Netlink socket buffer size has been set to 8388608 bytes.
<probe> WARN: 2017/08/03 22:26:01.712757 Error collecting weave status, backing off 20s: Get http://127.0.0.1:6784/report: dial tcp 127.0.0.1:6784: getsockopt: connection refused
<probe> WARN: 2017/08/03 22:26:11.814133 Error collecting weave ps, backing off 40s: exit status 1: "Link not found\n"
<probe> WARN: 2017/08/03 22:26:21.713240 Error collecting weave status, backing off 40s: Get http://127.0.0.1:6784/report: dial tcp 127.0.0.1:6784: getsockopt: connection refused
Hah! I know what the problem is. We forgot to update the k8s yamls for scope to include the /sys/kernel/debug
mount point.
E.g.
$ curl -Ls "https://cloud.weave.works/k8s.yaml?t=$TOKEN&k8s-version=$(kubectl version | base64 | tr -d '\n')" | sed '/serviceAccountName:/d' > weave.yaml
# Patch with
$ diff weave.yaml.old weave.yaml
--- weave.yaml.old 2017-08-03 23:33:53.904032111 +0100
+++ weave.yaml 2017-08-03 23:35:26.096941060 +0100
@@ -116,6 +116,8 @@
mountPath: /var/run/docker.sock
- name: scope-plugins
mountPath: /var/run/scope/plugins
+ - name: sys-tracing
+ mountPath: /sys/kernel/debug
hostNetwork: true
hostPID: true
volumes:
@@ -125,6 +127,9 @@
- name: scope-plugins
hostPath:
path: /var/run/scope/plugins
+ - name: sys-tracing
+ hostPath:
+ path: /sys/kernel/debug
- apiVersion: v1
kind: ServiceAccount
metadata:
$ kubectl apply -n kube-system -f weave.yaml
Can confirm that works with debug mounted...
deploy → kubectl --namespace=kube-system logs weave-scope-agent-6n1sl
time="2017-08-03T23:30:17Z" level=info msg="publishing to: https://cloud.weave.works:443"
<probe> INFO: 2017/08/03 23:30:17.263396 command line args: --mode=probe --no-app=true --probe.docker=true --probe.docker.bridge=docker0 --probe.kubernetes=true --service-token=<elided>
<probe> INFO: 2017/08/03 23:30:17.263439 probe starting, version 1.6.2, ID 7cd8c5e9449294a7
<probe> INFO: 2017/08/03 23:30:17.635207 kubernetes: targeting api server https://100.64.0.1:443
<probe> INFO: 2017/08/03 23:30:17.641013 Control connection to cloud.weave.works starting
<probe> WARN: 2017/08/03 23:30:17.762766 Error collecting weave status, backing off 10s: Get http://127.0.0.1:6784/report: dial tcp 127.0.0.1:6784: getsockopt: connection refused
<probe> WARN: 2017/08/03 23:30:17.773494 Cannot resolve 'scope.weave.local.': dial tcp 172.17.0.1:53: getsockopt: connection refused
<probe> WARN: 2017/08/03 23:30:17.814222 Error collecting weave ps, backing off 20s: exit status 1: "Link not found\n"
<probe> INFO: 2017/08/03 23:30:17.817249 Publish loop for cloud.weave.works starting
<probe> ERRO: 2017/08/03 23:30:17.863976 conntrack stderr:NOTICE: Netlink socket buffer size has been set to 8388608 bytes.
<probe> WARN: 2017/08/03 23:30:27.770931 Error collecting weave status, backing off 20s: Get http://127.0.0.1:6784/report: dial tcp 127.0.0.1:6784: getsockopt: connection refused
A user's scope fails to enable eBPF tracing with
They are running 4.4.65-k8s, apparently as shipped by kops 1.5.3.
The scope ebpf blog post says that
So is the above error an indication that the kernel has not been compiled with BPF? Or is something else not right?
And what does "etc" stand for?
@alban @2opremio any idea?