iovisor / bcc

BCC - Tools for BPF-based Linux IO analysis, networking, monitoring, and more
Apache License 2.0
20.64k stars 3.89k forks source link

Running BCC tools in CoreOS toolbox #1532

Open facetoe opened 6 years ago

facetoe commented 6 years ago

I'm trying to run the tools in CoreOS toolbox, but keep getting the following error:

[root@ip-10-224-55-200 tools]# ./execsnoop
/virtual/main.c:21:1: error: could not open bpf map: Operation not permitted
is maps/perf_output map type enabled in your kernel?
BPF_PERF_OUTPUT(events);
^
/virtual/include/bcc/helpers.h:88:4: note: expanded from macro 'BPF_PERF_OUTPUT'
}; \
   ^
/virtual/main.c:26:5: error: bpf_table events failed to open
    events.perf_submit(ctx, data, sizeof(struct data_t));
    ^
/virtual/main.c:74:5: error: bpf_table events failed to open
    events.perf_submit(ctx, &data, sizeof(data));
    ^
3 errors generated.
Traceback (most recent call last):
  File "./execsnoop", line 132, in <module>
    b = BPF(text=bpf_text.replace("MAXARG", args.max_args))
  File "/usr/lib/python2.7/site-packages/bcc/__init__.py", line 301, in __init__
    raise Exception("Failed to compile BPF module %s" % src_file)
Exception: Failed to compile BPF module

I followed the instructions to install from source here https://github.com/iovisor/bcc/blob/master/INSTALL.md#fedora---source. That all worked fine, and the tools compile and install. First I had this error:

chdir(/lib/modules/4.13.9-coreos/build): No such file or directory

So I symlinked the required directory:

ln -s /media/root/lib/modules/ /lib/modules

When I strace the process I can see the bpf call it is returning EPERM:

[root@ip-10-224-55-200 tools]# strace ./execsnoop  2>&1 | grep -i permi
bpf(BPF_MAP_CREATE, {map_type=BPF_MAP_TYPE_PERF_EVENT_ARRAY, key_size=4, value_size=4, max_entries=15, map_flags=0, inner_map_fd=0, ...}, 72) = -1 EPERM (Operation not permitted)
bpf(BPF_MAP_CREATE, {map_type=BPF_MAP_TYPE_PERF_EVENT_ARRAY, key_size=4, value_size=4, max_entries=15, map_flags=0, inner_map_fd=0, ...}, 72) = -1 EPERM (Operation not permitted)

I think the BPF stuff is enabled in the kernel:

 [root@ip-10-224-55-200 tools]# zgrep -i bpf /proc/config.gz
# CONFIG_CGROUP_BPF is not set
CONFIG_BPF=y
CONFIG_BPF_SYSCALL=y
CONFIG_NETFILTER_XT_MATCH_BPF=m
CONFIG_NET_CLS_BPF=m
# CONFIG_NET_ACT_BPF is not set
CONFIG_BPF_JIT=y
CONFIG_LWTUNNEL_BPF=y
CONFIG_HAVE_EBPF_JIT=y
CONFIG_BPF_EVENTS=y
# CONFIG_TEST_BPF is not set

The man page for bpf says the following:

       EPERM  The call was made without sufficient privilege (without the
              CAP_SYS_ADMIN capability).

but it looks like toolbox includes this:

sudo systemd-nspawn \
    --directory="${machinepath}" \
    --capability=all \
    --share-system \
        ${TOOLBOX_BIND} \
        ${TOOLBOX_ENV} \
    --user="${TOOLBOX_USER}" "$@"

I'm running toolbox as root.

System details:

~ $ uname -a
Linux ip-10-224-55-200 4.13.9-coreos #1 SMP Thu Oct 26 03:21:00 UTC 2017 x86_64 Intel(R) Xeon(R) CPU E5-2676 v3 @ 2.40GHz GenuineIntel GNU/Linux

Any idea why this is not working? Should it even work?

yonghong-song commented 6 years ago

Typically, map creation failure with permission denied is caused by either capability (e.g., you need root) or the limited locked memory. Could you run this command ulimit -l unlimited to see whether it may resolve your issue?

On Mon, Jan 15, 2018 at 10:31 PM, Facetoe notifications@github.com wrote:

I'm trying to run the tools in CoreOS toolbox, but keep getting the following error:

[root@ip-10-224-55-200 tools]# ./execsnoop /virtual/main.c:21:1: error: could not open bpf map: Operation not permitted is maps/perf_output map type enabled in your kernel? BPF_PERF_OUTPUT(events); ^ /virtual/include/bcc/helpers.h:88:4: note: expanded from macro 'BPF_PERF_OUTPUT' }; \ ^ /virtual/main.c:26:5: error: bpf_table events failed to open events.perf_submit(ctx, data, sizeof(struct data_t)); ^ /virtual/main.c:74:5: error: bpf_table events failed to open events.perf_submit(ctx, &data, sizeof(data)); ^ 3 errors generated. Traceback (most recent call last): File "./execsnoop", line 132, in b = BPF(text=bpf_text.replace("MAXARG", args.max_args)) File "/usr/lib/python2.7/site-packages/bcc/init.py", line 301, in init raise Exception("Failed to compile BPF module %s" % src_file) Exception: Failed to compile BPF module

I followed the instructions to install from source here https://github.com/iovisor/bcc/blob/master/INSTALL.md#fedora---source. That all worked fine, and the tools compile and install. First I had this error:

chdir(/lib/modules/4.13.9-coreos/build): No such file or directory

So I symlinked the required directory:

ln -s /media/root/lib/modules/ /lib/modules

When I strace the process I can see the bpf call it is returning EPERM:

[root@ip-10-224-55-200 tools]# strace ./execsnoop 2>&1 | grep -i permi bpf(BPF_MAP_CREATE, {map_type=BPF_MAP_TYPE_PERF_EVENT_ARRAY, key_size=4, value_size=4, max_entries=15, map_flags=0, inner_map_fd=0, ...}, 72) = -1 EPERM (Operation not permitted) bpf(BPF_MAP_CREATE, {map_type=BPF_MAP_TYPE_PERF_EVENT_ARRAY, key_size=4, value_size=4, max_entries=15, map_flags=0, inner_map_fd=0, ...}, 72) = -1 EPERM (Operation not permitted)

I think the BPF stuff is enabled in the kernel:

[root@ip-10-224-55-200 tools]# zgrep -i bpf /proc/config.gz

CONFIG_CGROUP_BPF is not set

CONFIG_BPF=y CONFIG_BPF_SYSCALL=y CONFIG_NETFILTER_XT_MATCH_BPF=m CONFIG_NET_CLS_BPF=m

CONFIG_NET_ACT_BPF is not set

CONFIG_BPF_JIT=y CONFIG_LWTUNNEL_BPF=y CONFIG_HAVE_EBPF_JIT=y CONFIG_BPF_EVENTS=y

CONFIG_TEST_BPF is not set

The man page for bpf says the following:

   EPERM  The call was made without sufficient privilege (without the
          CAP_SYS_ADMIN capability).

but it looks like toolbox includes this:

sudo systemd-nspawn \ --directory="${machinepath}" \ --capability=all \ --share-system \ ${TOOLBOX_BIND} \ ${TOOLBOX_ENV} \ --user="${TOOLBOX_USER}" "$@"

I'm running toolbox as root.

System details:

~ $ uname -a Linux ip-10-224-55-200 4.13.9-coreos #1 SMP Thu Oct 26 03:21:00 UTC 2017 x86_64 Intel(R) Xeon(R) CPU E5-2676 v3 @ 2.40GHz GenuineIntel GNU/Linux

Any idea why this is not working? Should it even work?

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/iovisor/bcc/issues/1532, or mute the thread https://github.com/notifications/unsubscribe-auth/ALq6olF96Gi0vLNNDAkmBP44bNLdnyxoks5tLEJLgaJpZM4RfTtH .

facetoe commented 6 years ago

Unfortunately, I still get the same error after running ulimit -l unlimited.

facetoe commented 6 years ago

I appear to have the CAP_SYS_ADMIN capability in the container:

[root@ip-10-224-55-200 ~]# capsh --print
Current: = cap_chown,cap_dac_override,cap_dac_read_search,cap_fowner,cap_fsetid,cap_kill,cap_setgid,cap_setuid,cap_setpcap,cap_linux_immutable,cap_net_bind_service,cap_net_broadcast,cap_net_admin,cap_net_raw,cap_ipc_lock,cap_ipc_owner,cap_sys_module,cap_sys_rawio,cap_sys_chroot,cap_sys_ptrace,cap_sys_pacct,cap_sys_admin,cap_sys_boot,cap_sys_nice,cap_sys_resource,cap_sys_time,cap_sys_tty_config,cap_mknod,cap_lease,cap_audit_write,cap_audit_control,cap_setfcap,cap_mac_override,cap_mac_admin,cap_syslog,cap_wake_alarm,cap_block_suspend,cap_audit_read+ep
Bounding set =cap_chown,cap_dac_override,cap_dac_read_search,cap_fowner,cap_fsetid,cap_kill,cap_setgid,cap_setuid,cap_setpcap,cap_linux_immutable,cap_net_bind_service,cap_net_broadcast,cap_net_admin,cap_net_raw,cap_ipc_lock,cap_ipc_owner,cap_sys_module,cap_sys_rawio,cap_sys_chroot,cap_sys_ptrace,cap_sys_pacct,cap_sys_admin,cap_sys_boot,cap_sys_nice,cap_sys_resource,cap_sys_time,cap_sys_tty_config,cap_mknod,cap_lease,cap_audit_write,cap_audit_control,cap_setfcap,cap_mac_override,cap_mac_admin,cap_syslog,cap_wake_alarm,cap_block_suspend,cap_audit_read
Securebits: 00/0x0/1'b0
 secure-noroot: no (unlocked)
 secure-no-suid-fixup: no (unlocked)
 secure-keep-caps: no (unlocked)
uid=0(root)
gid=0(root)
groups=
yonghong-song commented 6 years ago

Maybe some security enforcement inside the container which blocks bpf syscalls? Maybe you could try the following:

echo 'p:trace_SyS_bpf SyS_bpf' > /sys/kernel/debug/tracing/kprobe_events 
echo 1 > /sys/kernel/debug/tracing/events/kprobes/trace_SyS_bpf/enable
cat /sys/kernel/debug/tracing/trace_pipe

In a different window, run your command like "execsnoop.py" or "biosnoop.py". Any output from trace_pipe at least indicates that bpf syscall is reached.

facetoe commented 6 years ago

Aha, I think you might be on to something. When I execute the above (outside the container), and then execute "execsnoop.py" (inside the container), I see no output from the trace_pipe.

To verify that tracing was working, I executed (outside the container):

cd /sys/kernel/debug/tracing/
echo 'p:myopen1 do_sys_open' >> kprobe_events
echo 1 > events/kprobes/myopen1/enable
cat trace_pipe

Jumped in the container and opened some files. Sure enough, there is lots of output.

SELinux exists on the system and is set to permissive:

SELinux status:                 enabled
SELinuxfs mount:                /sys/fs/selinux
SELinux root directory:         /etc/selinux
Loaded policy name:             mcs
Current mode:                   permissive
Mode from config file:          permissive
Policy MLS status:              enabled
Policy deny_unknown status:     allowed
Max kernel policy version:      31

Perhaps this is what is preventing BPF execution?

I have managed to get the tools working in the following Docker container https://github.com/zlim/bcc-docker, so there must be something that Docker is doing to enable execution that systemd-nspawn does not.

4ast commented 6 years ago

in the recent kernels BPF syscall got its own LSM hook. The default behavior of selinux is to disallow all LSM hooks that were not explicitly listed in the policy. So these two together make bpf disabled when selinux is on.

facetoe commented 6 years ago

Thanks @4ast, I don't know much about SELinux, but from what I was reading permissive only logs actions but doesn't reject them: "In Permissive mode, SELinux is enabled but will not enforce the security policy, only warn and log actions. Permissive mode is useful for troubleshooting SELinux issues ."

I followed the steps here https://coreos.com/os/docs/latest/selinux.html, to enable SELinux logging and tried running "execsnoop.py" in the container, however no output was generated.

I thought I could just disable SELinux completely and see if that worked, however I don't seem to have the option to disable it:

setenforce
usage:  setenforce [ Enforcing | Permissive | 1 | 0 ]

I'm a bit stuck as to where to go from here, any suggestions?

yonghong-song commented 6 years ago

setenforce 0 is the way to disable it.

facetoe commented 6 years ago

Yep, tried that. No change unfortunately.

guilhermebr commented 6 years ago

The same is happening to me.

/virtual/main.c:22:1: error: could not open bpf map: Operation not permitted
is maps/perf_output map type enabled in your kernel?
BPF_PERF_OUTPUT(events);
^
/virtual/include/bcc/helpers.h:89:4: note: expanded from macro 'BPF_PERF_OUTPUT'
}; \
   ^
/virtual/main.c:27:5: error: bpf_table events failed to open
    events.perf_submit(ctx, data, sizeof(struct data_t));
    ^
/virtual/main.c:93:5: error: bpf_table events failed to open
    events.perf_submit(ctx, &data, sizeof(data));
    ^
3 errors generated.
Traceback (most recent call last):
  File "./execsnoop", line 166, in <module>
    b = BPF(text=bpf_text)
  File "/usr/lib/python2.7/dist-packages/bcc/__init__.py", line 318, in __init__
    raise Exception("Failed to compile BPF text")
Exception: Failed to compile BPF text
$ uname -a
Linux localhost.localdomain 4.17.14-202.fc28.x86_64 #1 SMP Wed Aug 15 12:29:25 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux
$ cat /boot/config-4.17.14-202.fc28.x86_64 |grep BPF
CONFIG_CGROUP_BPF=y
CONFIG_BPF=y
CONFIG_BPF_SYSCALL=y
CONFIG_BPF_JIT_ALWAYS_ON=y
CONFIG_NETFILTER_XT_MATCH_BPF=m
CONFIG_NET_CLS_BPF=m
CONFIG_NET_ACT_BPF=m
CONFIG_BPF_JIT=y
CONFIG_BPF_STREAM_PARSER=y
CONFIG_LWTUNNEL_BPF=y
CONFIG_HAVE_EBPF_JIT=y
CONFIG_BPF_EVENTS=y
# CONFIG_BPF_KPROBE_OVERRIDE is not set
# CONFIG_TEST_BPF is not set
asmca commented 6 years ago

The same is happening to me.

/virtual/main.c:22:1: error: could not open bpf map: Operation not permitted
is maps/perf_output map type enabled in your kernel?
BPF_PERF_OUTPUT(events);
^
/virtual/include/bcc/helpers.h:89:4: note: expanded from macro 'BPF_PERF_OUTPUT'
}; \
   ^
/virtual/main.c:27:5: error: bpf_table events failed to open
    events.perf_submit(ctx, data, sizeof(struct data_t));
    ^
/virtual/main.c:93:5: error: bpf_table events failed to open
    events.perf_submit(ctx, &data, sizeof(data));
    ^
3 errors generated.
Traceback (most recent call last):
  File "./execsnoop", line 166, in <module>
    b = BPF(text=bpf_text)
  File "/usr/lib/python2.7/dist-packages/bcc/__init__.py", line 318, in __init__
    raise Exception("Failed to compile BPF text")
Exception: Failed to compile BPF text
$ uname -a
Linux localhost.localdomain 4.17.14-202.fc28.x86_64 #1 SMP Wed Aug 15 12:29:25 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux
$ cat /boot/config-4.17.14-202.fc28.x86_64 |grep BPF
CONFIG_CGROUP_BPF=y
CONFIG_BPF=y
CONFIG_BPF_SYSCALL=y
CONFIG_BPF_JIT_ALWAYS_ON=y
CONFIG_NETFILTER_XT_MATCH_BPF=m
CONFIG_NET_CLS_BPF=m
CONFIG_NET_ACT_BPF=m
CONFIG_BPF_JIT=y
CONFIG_BPF_STREAM_PARSER=y
CONFIG_LWTUNNEL_BPF=y
CONFIG_HAVE_EBPF_JIT=y
CONFIG_BPF_EVENTS=y
# CONFIG_BPF_KPROBE_OVERRIDE is not set
# CONFIG_TEST_BPF is not set

also hit this error on both fedora 27/28, rpm or source build.

[root@suse tools]# ./tcptop /virtual/main.c:13:1: error: could not open bpf map: Operation not permitted is maps/hash map type enabled in your kernel? BPF_HASH(ipv4_send_bytes, struct ipv4_key_t); ^ /virtual/include/bcc/helpers.h:132:48: note: expanded from macro 'BPF_HASH' BPF_HASHX(VA_ARGS, BPF_HASH4, BPF_HASH3, BPF_HASH2, BPF_HASH1)(VA_ARGS) ^ /virtual/main.c:14:1: error: could not open bpf map: Operation not permitted is maps/hash map type enabled in your kernel? BPF_HASH(ipv4_recv_bytes, struct ipv4_key_t); ^ /virtual/include/bcc/helpers.h:132:48: note: expanded from macro 'BPF_HASH' BPF_HASHX(VA_ARGS, BPF_HASH4, BPF_HASH3, BPF_HASH2, BPF_HASH1)(VA_ARGS) ^ /virtual/main.c:23:1: error: could not open bpf map: Operation not permitted is maps/hash map type enabled in your kernel? BPF_HASH(ipv6_send_bytes, struct ipv6_key_t); ^ /virtual/include/bcc/helpers.h:132:48: note: expanded from macro 'BPF_HASH' BPF_HASHX(VA_ARGS, BPF_HASH4, BPF_HASH3, BPF_HASH2, BPF_HASH1)(VA_ARGS) ^ /virtual/main.c:24:1: error: could not open bpf map: Operation not permitted is maps/hash map type enabled in your kernel? BPF_HASH(ipv6_recv_bytes, struct ipv6_key_t); ^ /virtual/include/bcc/helpers.h:132:48: note: expanded from macro 'BPF_HASH' BPF_HASHX(VA_ARGS, BPF_HASH4, BPF_HASH3, BPF_HASH2, BPF_HASH1)(VA_ARGS) ^ /virtual/main.c:40:9: error: bpf_table ipv4_send_bytes failed to open ipv4_send_bytes.increment(ipv4_key, size); ^ /virtual/main.c:81:9: error: bpf_table ipv4_recv_bytes failed to open ipv4_recv_bytes.increment(ipv4_key, copied); ^ 6 errors generated. Traceback (most recent call last): File "./tcptop", line 207, in b = BPF(text=bpf_text) File "/usr/lib/python2.7/site-packages/bcc/init.py", line 318, in init raise Exception("Failed to compile BPF text")

cat /boot/config-4.18.5-200.fc28.x86_64 |grep BPF

CONFIG_CGROUP_BPF=y CONFIG_BPF=y CONFIG_BPF_SYSCALL=y CONFIG_BPF_JIT_ALWAYS_ON=y CONFIG_IPV6_SEG6_BPF=y CONFIG_NETFILTER_XT_MATCH_BPF=m

CONFIG_BPFILTER is not set

CONFIG_NET_CLS_BPF=m CONFIG_NET_ACT_BPF=m CONFIG_BPF_JIT=y CONFIG_BPF_STREAM_PARSER=y CONFIG_LWTUNNEL_BPF=y CONFIG_HAVE_EBPF_JIT=y CONFIG_BPF_EVENTS=y

CONFIG_BPF_KPROBE_OVERRIDE is not set

CONFIG_TEST_BPF is not set

mmolchan commented 6 years ago

Same on Fedora 28 with the default 4.18.7-200.fc28.x86_64 kernel. selinux is disabled, running as root. strace shows all bpf() calls are failing with EPERM. Not clear why.

30429 bpf(BPF_MAP_CREATE, {map_type=BPF_MAP_TYPE_HASH, key_size=8, value_size=8, max_entries=10240, map_flags=0, inner_map_fd=0, map_name="sizes", map_ifindex=0}, 72) = -1 EPERM (Operation not permitted)
30429 prlimit64(0, RLIMIT_MEMLOCK, NULL, {rlim_cur=RLIM64_INFINITY, rlim_max=RLIM64_INFINITY}) = 0
30429 prlimit64(0, RLIMIT_MEMLOCK, {rlim_cur=RLIM64_INFINITY, rlim_max=RLIM64_INFINITY}, NULL) = 0
30429 bpf(BPF_MAP_CREATE, {map_type=BPF_MAP_TYPE_HASH, key_size=8, value_size=8, max_entries=10240, map_flags=0, inner_map_fd=0, map_name="sizes", map_ifindex=0}, 72) = -1 EPERM (Operation not permitted)
30429 write(2, "/virtual/main.c", 15)   = 15
30429 write(2, ":", 1)                  = 1
30429 write(2, "15", 2)                 = 2
30429 write(2, ":", 1)                  = 1
30429 write(2, "1", 1)                  = 1
30429 write(2, ":", 1)                  = 1
30429 write(2, " ", 1)                  = 1
30429 write(2, "error", 5)              = 5
30429 write(2, ": ", 2)                 = 2
30429 write(2, "could", 5)              = 5
30429 write(2, " ", 1)                  = 1
30429 write(2, "not", 3)                = 3
30429 write(2, " ", 1)                  = 1
30429 write(2, "open", 4)               = 4
30429 write(2, " ", 1)                  = 1
30429 write(2, "bpf", 3)                = 3
30429 write(2, " ", 1)                  = 1
30429 write(2, "map:", 4)               = 4
30429 write(2, " ", 1)                  = 1
30429 write(2, "Operation", 9)          = 9
30429 write(2, " ", 1)                  = 1
30429 write(2, "not", 3)                = 3
30429 write(2, " ", 1)                  = 1
30429 write(2, "permitted", 9)          = 9
30429 write(2, "\nis maps/hash map type enabled in your kernel?", 46) = 46
30429 write(2, "\n", 1)                 = 1
30429 write(2, "BPF_HASH(sizes, u64);", 21) = 21
30429 write(2, "\n", 1)                 = 1
30429 write(2, "^", 1)                  = 1
30429 write(2, "\n", 1)                 = 1
30429 write(2, "/virtual/include/bcc/helpers.h", 30) = 30

strace -ebpf for just bpf() calls:

30481 bpf(BPF_MAP_CREATE, {map_type=BPF_MAP_TYPE_HASH, key_size=8, value_size=8, max_entries=10240, map_flags=0, inner_map_fd=0, map_name="sizes", map_ifindex=0}, 72) = -1 EPERM (Operation not permitted)
30481 bpf(BPF_MAP_CREATE, {map_type=BPF_MAP_TYPE_HASH, key_size=8, value_size=8, max_entries=10240, map_flags=0, inner_map_fd=0, map_name="sizes", map_ifindex=0}, 72) = -1 EPERM (Operation not permitted)
30481 bpf(BPF_MAP_CREATE, {map_type=BPF_MAP_TYPE_HASH, key_size=8, value_size=24, max_entries=1000000, map_flags=0, inner_map_fd=0, map_name="allocs", map_ifindex=0}, 72) = -1 EPERM (Operation not permitted)
30481 bpf(BPF_MAP_CREATE, {map_type=BPF_MAP_TYPE_HASH, key_size=8, value_size=24, max_entries=1000000, map_flags=0, inner_map_fd=0, map_name="allocs", map_ifindex=0}, 72) = -1 EPERM (Operation not permitted)
30481 bpf(BPF_MAP_CREATE, {map_type=BPF_MAP_TYPE_HASH, key_size=8, value_size=8, max_entries=10240, map_flags=0, inner_map_fd=0, map_name="memptrs", map_ifindex=0}, 72) = -1 EPERM (Operation not permitted)
30481 bpf(BPF_MAP_CREATE, {map_type=BPF_MAP_TYPE_HASH, key_size=8, value_size=8, max_entries=10240, map_flags=0, inner_map_fd=0, map_name="memptrs", map_ifindex=0}, 72) = -1 EPERM (Operation not permitted)
30481 bpf(BPF_MAP_CREATE, {map_type=BPF_MAP_TYPE_STACK_TRACE, key_size=4, value_size=1016, max_entries=16384, map_flags=0, inner_map_fd=0, map_name="stack_traces", map_ifindex=0}, 72) = -1 EPERM (Operation not permitted)
30481 bpf(BPF_MAP_CREATE, {map_type=BPF_MAP_TYPE_STACK_TRACE, key_size=4, value_size=1016, max_entries=16384, map_flags=0, inner_map_fd=0, map_name="stack_traces", map_ifindex=0}, 72) = -1 EPERM (Operation not permitted)
30481 bpf(BPF_MAP_CREATE, {map_type=BPF_MAP_TYPE_HASH, key_size=8, value_size=16, max_entries=10240, map_flags=0, inner_map_fd=0, map_name="combined_allocs", map_ifindex=0}, 72) = -1 EPERM (Operation not permitted)
30481 bpf(BPF_MAP_CREATE, {map_type=BPF_MAP_TYPE_HASH, key_size=8, value_size=16, max_entries=10240, map_flags=0, inner_map_fd=0, map_name="combined_allocs", map_ifindex=0}, 72) = -1 EPERM (Operation not permitted)
30482 +++ exited with 0 +++
30481 --- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_EXITED, si_pid=30482, si_uid=0, si_status=0, si_utime=0, si_stime=0} ---
30481 +++ exited with 1 +++
yonghong-song commented 6 years ago

Could you try use debugfs kretprobe to trace kernel function security_bpf? If the return value is not 0, it will be still related to selinux.

ti-mo commented 5 years ago

@facetoe Did you manage to figure this out in the end? I ran into a very similar issue trying to run an eBPF program with a minimal capability set, avoiding sudo or running as root. This thread is one of the only search results on the matter.

I came across this gem: https://stackoverflow.com/questions/40837181/how-to-raise-ulimit-hard-limit-for-real-time-priority-programmatically-with-setu. Apparently having nosuid set on the partition the file is stored on (ie. most home dirs, including the LUKS partition I work from) also prevents assuming capabilities, not just setuid.

Can't immediately find the script you linked, but I assume it creates a bind mount from a partition with nosuid set into the nspawn container, thus preventing certain eBPF features from working (eg. locking memory for creating maps, opening the trace pipe, etc.).

To the other posters in this thread: consider user namespace remapping as a factor, UID0 in the container != UID0 in the host namespace, so those containers will have to rely on assuming capabilities, even with --capabilities=all. Not familiar with nspawn, maybe this was shipped in an update?

mahrud commented 5 years ago

Could you try use debugfs kretprobe to trace kernel function security_bpf? If the return value is not 0, it will be still related to selinux.

I'm running into the same issue, selinux is definitely turned off, but I'm not sure how to use debugfs.

jgehrcke commented 5 years ago

@facetoe I'd also love to understand whether or not you found a resolution to this. Thanks for reporting in detail about the debugging work you did, already super helpful.

facetoe commented 5 years ago

Nope, never found a solution unfortunately. I ended up pulling Docker containers for the BPF stuff.

deg00 commented 5 years ago

For Fedora 30, the problem is not selinux but kernel-lockdown. If you leave selinux in enforcing mode but disable kernel lockdown, you can then use bcc tools as root. To disable kernel lockdown:

echo 1 > /proc/sys/kernel/sysrq echo x > /proc/sysrq-trigger

( just verified on a fresh FC30 upgrade )

palmtenor commented 5 years ago

@deg00 Great! Do you mind adding it to the FAQ?

jgehrcke commented 5 years ago

Thank you so much @deg00 -- this resolved the issue described in https://github.com/iovisor/bcc/issues/2525.

For the issue described here it's not entirely clear if kernel lockdown is the problem. This was reported with kernel version 4.13 which probably does not have kernel lockdown code, right?

I have created a quick FAQ patch for this a minute ago...: https://github.com/iovisor/bcc/pull/2532

psanford commented 5 years ago

Note that various versions of the lockdown patches have been backported by distros into older kernel versions. See https://github.com/iovisor/bpftrace/issues/853 for the ubuntu variant I ran into.

phylake commented 5 years ago

@deg00 what option does x map to in echo x > /proc/sysrq-trigger?

I get the following in dmesg (i.e. there's no x option)

[17030.875113] sysrq: SysRq : HELP : loglevel(0-9) reboot(b) crash(c) terminate-all-tasks(e) memory-full-oom-kill(f) kill-all-tasks(i) thaw-filesystems(j) sak(k) show-backtrace-all-active-cpus(l) show-memory-usage(m) nice-all-RT-tasks(n) poweroff(o) show-registers(p) show-all-timers(q) unraw(r) sync(s) show-task-states(t) unmount(u) show-blocked-tasks(w) dump-ftrace-buffer(z)
deg00 commented 5 years ago

x mapped to disable kernel_lockdown on x86 ( https://lore.kernel.org/patchwork/patch/1046899/ ). I haven't had time to see if the recent merge of kernel lockdown in to the Linus kernel has changed behavior, but that might be it. Or it might be that your kernel does not have the kernel lockdown patch ( and so does not have the sysrq trigger ). Not helpful, I know, but the best I can do at the moment.

phylake commented 5 years ago

Actually it was super helpful. Thank you!

Connor1996 commented 3 years ago

Encounter similar issue in docker env. And I find there is an "SELinux: mount invalid" log in dmesg. Then disabling selinux works for me.

echo 0 > /sys/fs/selinux/enforce
simar7 commented 3 years ago

in the recent kernels BPF syscall got its own LSM hook. The default behavior of selinux is to disallow all LSM hooks that were not explicitly listed in the policy. So these two together make bpf disabled when selinux is on.

Thanks @4ast for this info - I stumbled upon your comment and it helped. Would you know where this was added to the kernel?

I'm guessing it was this? https://github.com/torvalds/linux/commit/ec27c3568a34c7fe5fcf4ac0a354eda77687f7eb

yonghong-song commented 3 years ago

@simar7 Yes, the above patch is one in the original patch series.