pixie-io / pixie

Instant Kubernetes-Native Application Observability
https://px.dev
Apache License 2.0
5.59k stars 429 forks source link

Socket tracer unable to start on 6.10 and later kernels #2035

Closed ddelnano closed 1 month ago

ddelnano commented 1 month ago

Describe the bug From working with someone in the community, I received a report that their OpenSUSE MicroOS instances were failing to start the socket tracer. Their PEMs fail to compile multiple BPF programs (as seen below). Their instances are running a 6.11 kernel while our latest kernel headers are 6.1.x.

I'm in the process of verifying that newer kernel headers resolves their problems, and if that's the case, our linux header kernels should be updated through 6.11.

Logs

pixie_logs_20241001120707.zip

The relevant logs from that PEM are the following:

E20241001 18:31:33.085537 91215 task_struct_resolver.cc:330] Internal : Unable to initialize BCC BPF program: Unable to initialize BPF program
I20241001 18:31:33.150326 91217 bcc_wrapper.cc:166] Initializing BPF program ...
In file included from <built-in>:4:
In file included from /virtual/include/bcc/helpers.h:54:
In file included from arch/arm64/include/asm/page.h:52:
include/asm-generic/getorder.h:33:27: error: use of undeclared identifier 'CONFIG_ARM64_PAGE_SHIFT'
                        return BITS_PER_LONG - PAGE_SHIFT;
                                               ^
arch/arm64/include/asm/page-def.h:14:21: note: expanded from macro 'PAGE_SHIFT'
#define PAGE_SHIFT              CONFIG_ARM64_PAGE_SHIFT
                                ^
In file included from <built-in>:4:
In file included from /virtual/include/bcc/helpers.h:54:
In file included from arch/arm64/include/asm/page.h:52:
include/asm-generic/getorder.h:35:22: error: use of undeclared identifier 'CONFIG_ARM64_PAGE_SHIFT'
                if (size < (1UL << PAGE_SHIFT))
                                   ^
arch/arm64/include/asm/page-def.h:14:21: note: expanded from macro 'PAGE_SHIFT'
#define PAGE_SHIFT              CONFIG_ARM64_PAGE_SHIFT
                                ^
In file included from <built-in>:4:
In file included from /virtual/include/bcc/helpers.h:54:
In file included from arch/arm64/include/asm/page.h:52:
include/asm-generic/getorder.h:38:30: error: use of undeclared identifier 'CONFIG_ARM64_PAGE_SHIFT'
                return ilog2((size) - 1) - PAGE_SHIFT + 1;
                                           ^
arch/arm64/include/asm/page-def.h:14:21: note: expanded from macro 'PAGE_SHIFT'
#define PAGE_SHIFT              CONFIG_ARM64_PAGE_SHIFT
                                ^
In file included from <built-in>:4:
In file included from /virtual/include/bcc/helpers.h:54:
In file included from arch/arm64/include/asm/page.h:52:
include/asm-generic/getorder.h:42:11: error: use of undeclared identifier 'CONFIG_ARM64_PAGE_SHIFT'
        size >>= PAGE_SHIFT;
                 ^
arch/arm64/include/asm/page-def.h:14:21: note: expanded from macro 'PAGE_SHIFT'
#define PAGE_SHIFT              CONFIG_ARM64_PAGE_SHIFT
                                ^
In file included from src/stirling/bpf_tools/bcc_bpf/task_struct_mem_read.c:24:
In file included from src/stirling/bpf_tools/bcc_bpf/system-headers/linux/sched.h:1:
In file included from include/linux/sched.h:14:
In file included from include/linux/pid.h:5:
In file included from include/linux/rculist.h:11:
In file included from include/linux/rcupdate.h:30:
arch/arm64/include/asm/processor.h:314:16: error: use of undeclared identifier 'CONFIG_ARM64_PAGE_SHIFT'
        return addr < TASK_SIZE;
                      ^
arch/arm64/include/asm/processor.h:68:5: note: expanded from macro 'TASK_SIZE'
                                TASK_SIZE_32 : TASK_SIZE_64)
                                ^
arch/arm64/include/asm/processor.h:65:42: note: expanded from macro 'TASK_SIZE_32'
#define TASK_SIZE_32            (UL(0x100000000) - PAGE_SIZE)
                                                   ^
arch/arm64/include/asm/page-def.h:15:35: note: expanded from macro 'PAGE_SIZE'
#define PAGE_SIZE               (_AC(1, UL) << PAGE_SHIFT)
                                               ^
arch/arm64/include/asm/page-def.h:14:21: note: expanded from macro 'PAGE_SHIFT'
#define PAGE_SHIFT              CONFIG_ARM64_PAGE_SHIFT
                                ^
In file included from src/stirling/bpf_tools/bcc_bpf/task_struct_mem_read.c:24:
In file included from src/stirling/bpf_tools/bcc_bpf/system-headers/linux/sched.h:1:
In file included from include/linux/sched.h:32:
include/linux/mm_types_task.h:19:10: fatal error: 'asm/tlbbatch.h' file not found
#include <asm/tlbbatch.h>
         ^~~~~~~~~~~~~~~~
6 errors generated.
I20241001 18:31:34.551831 91217 scoped_timer.h:48] Timer(init_bpf_program) : 1.40 s
E20241001 18:31:34.551985 91217 task_struct_resolver.cc:330] Internal : Unable to initialize BCC BPF program: Unable to initialize BPF program
W20241001 18:31:34.552109 91084 bcc_wrapper.cc:149] Failed to obtain task_struct offsets, will not override the task_struct offsets, error: Internal : Resolution failed in subprocess. Check subprocess logs for the error.
I20241001 18:31:34.552258 91084 bcc_wrapper.cc:166] Initializing BPF program ...
In file included from <built-in>:4:
In file included from /virtual/include/bcc/helpers.h:54:
In file included from arch/arm64/include/asm/page.h:52:
include/asm-generic/getorder.h:33:27: error: use of undeclared identifier 'CONFIG_ARM64_PAGE_SHIFT'
                        return BITS_PER_LONG - PAGE_SHIFT;
                                               ^
arch/arm64/include/asm/page-def.h:14:21: note: expanded from macro 'PAGE_SHIFT'
#define PAGE_SHIFT              CONFIG_ARM64_PAGE_SHIFT
                                ^
In file included from <built-in>:4:
In file included from /virtual/include/bcc/helpers.h:54:
In file included from arch/arm64/include/asm/page.h:52:
include/asm-generic/getorder.h:35:22: error: use of undeclared identifier 'CONFIG_ARM64_PAGE_SHIFT'
                if (size < (1UL << PAGE_SHIFT))
                                   ^
arch/arm64/include/asm/page-def.h:14:21: note: expanded from macro 'PAGE_SHIFT'
#define PAGE_SHIFT              CONFIG_ARM64_PAGE_SHIFT
                                ^
In file included from <built-in>:4:
In file included from /virtual/include/bcc/helpers.h:54:
In file included from arch/arm64/include/asm/page.h:52:
include/asm-generic/getorder.h:38:30: error: use of undeclared identifier 'CONFIG_ARM64_PAGE_SHIFT'
                return ilog2((size) - 1) - PAGE_SHIFT + 1;
                                           ^
arch/arm64/include/asm/page-def.h:14:21: note: expanded from macro 'PAGE_SHIFT'
#define PAGE_SHIFT              CONFIG_ARM64_PAGE_SHIFT
                                ^
In file included from <built-in>:4:
In file included from /virtual/include/bcc/helpers.h:54:
In file included from arch/arm64/include/asm/page.h:52:
include/asm-generic/getorder.h:42:11: error: use of undeclared identifier 'CONFIG_ARM64_PAGE_SHIFT'
        size >>= PAGE_SHIFT;
                 ^
arch/arm64/include/asm/page-def.h:14:21: note: expanded from macro 'PAGE_SHIFT'
#define PAGE_SHIFT              CONFIG_ARM64_PAGE_SHIFT
                                ^
In file included from src/stirling/source_connectors/proc_exit/bcc_bpf/proc_exit_trace.c:24:
In file included from ./src/stirling/bpf_tools/bcc_bpf/task_struct_utils.h:26:
In file included from src/stirling/bpf_tools/bcc_bpf/system-headers/linux/sched.h:1:
In file included from include/linux/sched.h:14:
In file included from include/linux/pid.h:5:
In file included from include/linux/rculist.h:11:
In file included from include/linux/rcupdate.h:30:
arch/arm64/include/asm/processor.h:314:16: error: use of undeclared identifier 'CONFIG_ARM64_PAGE_SHIFT'
        return addr < TASK_SIZE;
                      ^
arch/arm64/include/asm/processor.h:68:5: note: expanded from macro 'TASK_SIZE'
                                TASK_SIZE_32 : TASK_SIZE_64)
                                ^
arch/arm64/include/asm/processor.h:65:42: note: expanded from macro 'TASK_SIZE_32'
#define TASK_SIZE_32            (UL(0x100000000) - PAGE_SIZE)
                                                   ^
arch/arm64/include/asm/page-def.h:15:35: note: expanded from macro 'PAGE_SIZE'
#define PAGE_SIZE               (_AC(1, UL) << PAGE_SHIFT)
                                               ^
arch/arm64/include/asm/page-def.h:14:21: note: expanded from macro 'PAGE_SHIFT'
#define PAGE_SHIFT              CONFIG_ARM64_PAGE_SHIFT
                                ^
In file included from src/stirling/source_connectors/proc_exit/bcc_bpf/proc_exit_trace.c:24:
In file included from ./src/stirling/bpf_tools/bcc_bpf/task_struct_utils.h:26:
In file included from src/stirling/bpf_tools/bcc_bpf/system-headers/linux/sched.h:1:
In file included from include/linux/sched.h:32:
include/linux/mm_types_task.h:19:10: fatal error: 'asm/tlbbatch.h' file not found
#include <asm/tlbbatch.h>
         ^~~~~~~~~~~~~~~~
6 errors generated.

App information (please complete the following information):

ddelnano commented 1 month ago

After supplying one-off built kernel headers from #2036, this community user's ARM64 and x86 PEMs are still seeing BPF compilation issues. I ran one of the trace bpf tests in qemu with a 6.11.1 kernel and the new headers, and I'm able to reproduce the same error message that they have.

observabilityvizier-pem-dv96npem.log qemu_dns_trace_bpf_test.log

ddelnano commented 1 month ago

I was able to track down the problem and upgrading bcc fixes the issue.

BCC has certain "virtual" files it includes behind the scenes. The compat/linux/virtual_bpf.h file in particular needs to be kept in sync with libbpf and matches the header guard of the include/uapi/linux/bpf.h file. This means that while our linux headers were updated, our older bcc install was inserting an older copy of the uapi/linux/bpf.h file -- one that didn't contain the bpf_wq declaration.

I need to double check that my rebasing of bcc's updated changes is correct, and update our fork (pixie-io/bcc) first, but I should be able to open a PR for this soon.