iovisor / bcc

BCC - Tools for BPF-based Linux IO analysis, networking, monitoring, and more
Apache License 2.0
20.31k stars 3.85k forks source link

running disksnoop.py get "bpf: Failed to load program: Invalid argument" #1591

Open liubogithub opened 6 years ago

liubogithub commented 6 years ago

Hi,

While running disksnoop.py on own built 4.16-rc1+ (the base distro is fedora 27), I got this error, tried to add debug=6 to debug, but it doesn't seem to point out where opcode 00 comes from, any ideas? @yonghong-song

`
clang -cc1 -triple x86_64-unknown-linux-gnu -emit-llvm-bc -emit-llvm-uselists -disable-free -disable-llvm-verifier -discard-value-names -main-file-name main.c -mrelocation-model static -mthread-model posix -fmath-errno -masm-verbose -mconstructor-aliases -fuse-init-array -target-cpu x86-64 -momit-leaf-frame-pointer -dwarf-column-info -debugger-tuning=gdb -coverage-notes-file /home/hostshare/main.gcno -nostdsysteminc -nobuiltininc -resource-dir lib64/clang/5.0.1 -isystem /virtual/lib/clang/include -include ./include/linux/kconfig.h -include /virtual/include/bcc/bpf.h -include /virtual/include/bcc/helpers.h -isystem /virtual/include -I /home/ebpf/myebpf -I /lib/modules/4.16.0-rc1+/build/arch/x86/include -I /lib/modules/4.16.0-rc1+/build/arch/x86/include/generated/uapi -I /lib/modules/4.16.0-rc1+/build/arch/x86/include/generated -I /lib/modules/4.16.0-rc1+/build/include -I /lib/modules/4.16.0-rc1+/build/./arch/x86/include/uapi -I /lib/modules/4.16.0-rc1+/build/arch/x86/include/generated/uapi -I /lib/modules/4.16.0-rc1+/build/include/uapi -I /lib/modules/4.16.0-rc1+/build/include/generated -I /lib/modules/4.16.0-rc1+/build/include/generated/uapi -I ./arch/x86/include -I arch/x86/include/generated/uapi -I arch/x86/include/generated -I include -I ./arch/x86/include/uapi -I arch/x86/include/generated/uapi -I ./include/uapi -I include/generated/uapi -D __KERNEL__ -D __HAVE_BUILTIN_BSWAP16__ -D __HAVE_BUILTIN_BSWAP32__ -D __HAVE_BUILTIN_BSWAP64__ -O2 -Wno-deprecated-declarations -Wno-gnu-variable-sized-type-not-at-end -Wno-pragma-once-outside-header -Wno-address-of-packed-member -Wno-unknown-warning-option -Wno-unused-value -Wno-pointer-sign -fdebug-compilation-dir /home/hostshare -ferror-limit 19 -fmessage-length 0 -fobjc-runtime=gcc -fdiagnostics-show-option -vectorize-loops -vectorize-slp -o main.bc -x c /virtual/main.c`

#include <uapi/linux/ptrace.h>
#include <linux/blkdev.h>

BPF_HASH(start, struct request *);

__attribute__((section(".bpf.fn.trace_start")))
void trace_start(struct pt_regs *ctx)
{ struct request *req = ctx->di;
    u64 ts = bpf_ktime_get_ns();
    bpf_map_update_elem((void *)bpf_pseudo_fd(1, 4), &req, &ts, BPF_ANY);
}

__attribute__((section(".bpf.fn.trace_end")))
void trace_end(struct pt_regs *ctx)
{ struct request *req = ctx->di;
    u64 ts = bpf_ktime_get_ns();
    u64 *tsp;

    tsp = bpf_map_lookup_elem((void *)bpf_pseudo_fd(1, 4), &req);
    if (tsp != 0) {
        ({ char _fmt[] = "%d %x %d\n"; bpf_trace_printk_(_fmt, sizeof(_fmt), ({ typeof(unsigned int) _val; memset(&_val, 0, sizeof(_val)); bpf_probe_read(&_val, sizeof(_val), (u64)&req->__data_len); _val; }), ({ typeof(unsigned int) _val; memset(&_val, 0, sizeof(_val)); bpf_probe_read(&_val, sizeof(_val), (u64)&req->cmd_flags); _val; }), (ts - *tsp)); });
        bpf_map_delete_elem((void *)bpf_pseudo_fd(1, 4), &req);
    }
}
0: (79) r1 = *(u64 *)(r1 +112)
1: (7b) *(u64 *)(r10 -8) = r1
2: (85) call bpf_ktime_get_ns#5
3: (7b) *(u64 *)(r10 -16) = r0
4: (18) r1 = 0xffff88022dc75c00
6: (bf) r2 = r10
7: (07) r2 += -8
8: (bf) r3 = r10
9: (07) r3 += -16
10: (b7) r4 = 0
11: (85) call bpf_map_update_elem#2
12: (95) exit
processed 12 insns (limit 131072), stack depth 16

bpf: Failed to load program: Invalid argument
unknown opcode 00

Traceback (most recent call last):
  File "disksnoop.py", line 33, in <module>
    b.attach_kprobe(event = "blk_account_io_completion", fn_name = "trace_end")
  File "/usr/lib/python3.6/site-packages/bcc/__init__.py", line 519, in attach_kprobe
    fn = self.load_func(fn_name, BPF.KPROBE)
  File "/usr/lib/python3.6/site-packages/bcc/__init__.py", line 348, in load_func
    (func_name, errstr))
Exception: Failed to load BPF program trace_end: Invalid argument
liubogithub commented 6 years ago

BTW, scripts without the global BPF_HASH() works fine, e.g. hello_world.py

yonghong-song commented 6 years ago

I am using a FC26 based system and booted it with latest net-next (also 4.16 rc1, plus some networking specific codes), and I did not see any issues. Probably need to try on FC27 based system.

Could you use debug=8 (or debug=12) (if you use llvm 6.0 or later)? This way, you can dump out byte codes before it calls bpf syscall to load. This way, it will show whether the byte code has any issue or not.

yonghong-song commented 6 years ago

Tried on FC27 with latest net-next, still did not observe the issue. Maybe you can try different compilers?

liubogithub commented 6 years ago

I got llvm-libs-5.0.1-2.fc27.x86_64 installed currently.

So I managed to get it work by building kernel with f27's config-4.15.2-300.fc27.x86_64, however what I haven't figured out is, that my original config already contains all the items bpf requires as suggested in INSTALL.md:

$ grep -i bpf config.bak 
# CONFIG_CGROUP_BPF is not set
CONFIG_BPF=y
CONFIG_BPF_SYSCALL=y
CONFIG_BPF_JIT_ALWAYS_ON=y
# CONFIG_NETFILTER_XT_MATCH_BPF is not set
# CONFIG_NET_CLS_BPF is not set
# CONFIG_NET_ACT_BPF is not set
CONFIG_BPF_JIT=y
# CONFIG_BPF_STREAM_PARSER is not set
CONFIG_HAVE_EBPF_JIT=y
CONFIG_BPF_EVENTS=y
CONFIG_BPF_KPROBE_OVERRIDE=y

And this is the f27's config:

grep -i bpf config-4.15.2-300.fc27.x86_64 
CONFIG_CGROUP_BPF=y
CONFIG_BPF=y
CONFIG_BPF_SYSCALL=y
CONFIG_BPF_JIT_ALWAYS_ON=y
CONFIG_NETFILTER_XT_MATCH_BPF=m
CONFIG_NET_CLS_BPF=m
CONFIG_NET_ACT_BPF=m
CONFIG_BPF_JIT=y
CONFIG_BPF_STREAM_PARSER=y
CONFIG_LWTUNNEL_BPF=y
CONFIG_HAVE_EBPF_JIT=y
CONFIG_BPF_EVENTS=y
# CONFIG_TEST_BPF is not set

The difference here are CONFIG_CGROUP_BPF and CGROUP_BPF_STREAM_PARSER, but by reading their descriptions I thought they're optional, odd things happen here.

yonghong-song commented 6 years ago

Just tried removing CONFIG_CGROUP_BPF and CGROUP_BPF_STREAM_PARSER, disksnoop.py still works fine. Agreed that these two options should not impact disksnoop.py and causing invalid instructions.

liubogithub commented 6 years ago

I'm running out of ideas, now I've got a 'workable' config and a 'non-workable' config, by doing diff I barely found anything could be the cause to the 'invalid argument' failure, probably you can figure it out since you're more experienced in this area?

config.bpf-work config.bpf-nonwork

And the diff are listed here, config.diff

-CONFIG_IRQ_TIME_ACCOUNTING=y
-CONFIG_PRINTK_SAFE_LOG_BUF_SHIFT=12
-CONFIG_CGROUP_PIDS=y
-CONFIG_SLAB_FREELIST_RANDOM=y
-CONFIG_SLAB_FREELIST_HARDENED=y
-CONFIG_BLK_DEV_ZONED=y
-CONFIG_BLK_DEV_THROTTLING_LOW=y
-CONFIG_BLK_SED_OPAL=y
-CONFIG_INTEL_RDT=y
-CONFIG_X86_NUMACHIP=y
-CONFIG_XEN_PVH=y
-CONFIG_PREEMPT_COUNT=y
-CONFIG_X86_MCELOG_LEGACY=y
-CONFIG_AMD_MEM_ENCRYPT=y
-CONFIG_ARCH_USE_MEMREMAP_PROT=y
-CONFIG_NODES_SHIFT=10
-CONFIG_HAVE_BOOTMEM_INFO_NODE=y
-CONFIG_MEMORY_HOTPLUG_DEFAULT_ONLINE=y
-CONFIG_MEMORY_HOTREMOVE=y
-CONFIG_CMA=y
-CONFIG_CMA_AREAS=7
-CONFIG_Z3FOLD=y
-CONFIG_ZONE_DEVICE=y
-CONFIG_ARCH_HAS_HMM=y
-CONFIG_MIGRATE_VMA_HELPER=y
-CONFIG_HMM=y
-CONFIG_HMM_MIRROR=y
-CONFIG_DEVICE_PRIVATE=y
-CONFIG_DEVICE_PUBLIC=y
-CONFIG_X86_INTEL_MPX=y
-CONFIG_RANDOMIZE_BASE=y
-CONFIG_X86_NEED_RELOCS=y
-CONFIG_RANDOMIZE_MEMORY=y
-CONFIG_RANDOMIZE_MEMORY_PHYSICAL_PADDING=0xa
-CONFIG_PM_TEST_SUSPEND=y
-CONFIG_XPOWER_PMIC_OPREGION=y
-CONFIG_CPU_FREQ_STAT=y
-CONFIG_CPU_FREQ_GOV_SCHEDUTIL=y
-CONFIG_HOTPLUG_PCI_PCIE=y
-CONFIG_PCIE_DPC=y
-CONFIG_PCIE_PTM=y
-CONFIG_HOTPLUG_PCI=y
-CONFIG_HOTPLUG_PCI_ACPI=y
-CONFIG_IPV6_SEG6_LWTUNNEL=y
-CONFIG_IPV6_SEG6_HMAC=y
-CONFIG_MPLS=y
-CONFIG_NET_L3_MASTER_DEV=y
-CONFIG_NET_NCSI=y
-CONFIG_NET_DROP_MONITOR=y
-CONFIG_NET_9P_DEBUG=y
-CONFIG_LWTUNNEL=y
-CONFIG_LWTUNNEL_BPF=y
-CONFIG_DMA_SHARED_BUFFER=y
-CONFIG_ZRAM=y
-CONFIG_BLK_DEV_LOOP_MIN_COUNT=8
-CONFIG_VIRTIO_BLK_SCSI=y
-CONFIG_BLK_DEV_MD=y
-CONFIG_MD_AUTODETECT=y
-CONFIG_BLK_DEV_DM=y
-CONFIG_DM_BUFIO=y
-CONFIG_DM_DEBUG_BLOCK_MANAGER_LOCKING=y
-CONFIG_DM_SNAPSHOT=y
-CONFIG_DM_MIRROR=y
-CONFIG_DM_ZERO=y
-CONFIG_SERIAL_8250_RUNTIME_UARTS=32
-CONFIG_SERIAL_8250_RT288X=y
-CONFIG_SERIAL_DEV_BUS=y
-CONFIG_SERIAL_DEV_CTRL_TTYPORT=y
-CONFIG_I2C_DESIGNWARE_SLAVE=y
-CONFIG_I2C_DESIGNWARE_BAYTRAIL=y
-CONFIG_SPI=y
-CONFIG_SPI_MASTER=y
-CONFIG_PINCTRL_CHERRYVIEW=y
-CONFIG_WATCHDOG_SYSFS=y
-CONFIG_REGULATOR=y
-CONFIG_EDAC_GHES=y
-CONFIG_ASYNC_TX_DMA=y
-CONFIG_SYNC_FILE=y
-CONFIG_AMD_IOMMU=y
-CONFIG_INTEL_IOMMU_SVM=y
-CONFIG_RAS_CEC=y
-CONFIG_NVDIMM_PFN=y
-CONFIG_NVDIMM_DAX=y
-CONFIG_APPLE_PROPERTIES=y
-CONFIG_EFI_DEV_PATH_PARSER=y
-CONFIG_EXT4_ENCRYPTION=y
-CONFIG_EXT4_FS_ENCRYPTION=y
-CONFIG_FS_DAX_PMD=y
-CONFIG_FS_ENCRYPTION=y
-CONFIG_PROC_CHILDREN=y
-CONFIG_PAGE_POISONING=y
-CONFIG_PAGE_POISONING_NO_SANITY=y
-CONFIG_SLUB_DEBUG_ON=y
-CONFIG_DEBUG_ATOMIC_SLEEP=y
-CONFIG_FAIL_FUTEX=y
-CONFIG_HWLAT_TRACER=y
-CONFIG_BRANCH_PROFILE_NONE=y
-CONFIG_MMIOTRACE=y
-CONFIG_BUG_ON_DATA_CORRUPTION=y
-CONFIG_IO_STRICT_DEVMEM=y
-CONFIG_EARLY_PRINTK_USB_XDBC=y
-CONFIG_DEBUG_WX=y
-CONFIG_ENCRYPTED_KEYS=y
-CONFIG_KEY_DH_OPERATIONS=y
-CONFIG_HARDENED_USERCOPY=y
-CONFIG_HARDENED_USERCOPY_FALLBACK=y
-CONFIG_FORTIFY_SOURCE=y
-CONFIG_CRYPTO_KPP=y
-CONFIG_CRYPTO_DH=y
-CONFIG_CRYPTO_CTS=y
-CONFIG_CRYPTO_USER_API_AEAD=y
-CONFIG_SECONDARY_TRUSTED_KEYRING=y
-CONFIG_SYSTEM_BLACKLIST_KEYRING=y
-CONFIG_SYSTEM_BLACKLIST_HASH_LIST=""
-CONFIG_IRQ_POLL=y