grafana / beyla

eBPF-based autoinstrumentation of web applications and network metrics
https://grafana.com/oss/beyla-ebpf/
Apache License 2.0
1.34k stars 96 forks source link

Investigate the need for `CAP_SYS_ADMIN` in a few environments #1070

Open rafaelroquetto opened 1 month ago

rafaelroquetto commented 1 month ago

In some environments (e.g. virtual machine with Kind) require also the CAP_SYS_ADMIN capabilities. We are currently not testing for this capability and in these cases, Beyla may fail after the capabilities test with a less tailored error message. We need to investigate when CAP_SYS_ADMIN is used, and since this is a sort of "umbrella" capability, perhaps check if there are more granular capabilities that could be used in these instances instead.

marevers commented 1 week ago

In my environment (AKS v1.30.3) it seems I cannot make do without CAP_SYS_ADMIN. I've tried combinations of other capabilities, like this one:

        capabilities:
          add:
            - BPF
            - SYS_PTRACE
            - NET_RAW
            - CHECKPOINT_RESTORE
            - DAC_READ_SEARCH
            - PERFMON
            - SYS_RESOURCE
            - NET_ADMIN
          drop:
            - ALL

Without adding SYS_ADMIN as well, the following error is shown in the Beyla logs:

time=2024-09-12T09:37:49.867Z level=ERROR msg="Unable to load eBPF watcher for process events" component=discover.ProcessWatcher interval=5s error="instrumenting function \"sys_bind\": setting kprobe: creating perf_kprobe PMU (arch-specific fallback for \"sys_bind\"): token sys_bind: opening perf event: permission denied"

I've investigated the Linux man for more information on what CAP_SYS_ADMIN is covering, but based on the error message I cannot really determine what of the mentioned actions is triggering the error.

Other environments I've tested before (EKS, versions between 1.27 and 1.29) work fine without SYS_ADMIN.

grcevski commented 1 week ago

The finer grained permissions work only if:

So it's dependent on the Linux kernel version as well as the container runtime version. If those conditions above are not met CAP_SYS_ADMIN is always required.

marevers commented 1 week ago

@grcevski interestingly enough the AKS v1.30.3 environment has the following versions:

So based on your parameters it should work without CAP_SYS_ADMIN, but it still complains. There might be other requirements as well, or perhaps this Azure-specific kernel version works a bit differently than the default one.