libbpf / libbpf-rs

Minimal and opinionated eBPF tooling for the Rust ecosystem
Other
790 stars 138 forks source link

Generated `SkelBuilder::open` fails with non-vendored libbpf older than 1.4 #927

Closed ueno closed 3 months ago

ueno commented 3 months ago

When libbpf installed on the system is older than 1.4 (e.g., 1.2), and the vendoring is disabled with default-features = false, I get the following error when calling generated SkelBuilder::open:

libbpf: bpf_object_open_opts has non-zero extra bytes

To reproduce, modify Cargo.toml in examples/capable as:

diff --git a/examples/capable/Cargo.toml b/examples/capable/Cargo.toml
index 17826b8..00b922a 100644
--- a/examples/capable/Cargo.toml
+++ b/examples/capable/Cargo.toml
@@ -6,12 +6,12 @@ authors = ["Devasia Thomas <https://www.linkedin.com/in/devasiathomas>"]
 license = "LGPL-2.1-only OR BSD-2-Clause"

 [build-dependencies]
-libbpf-cargo = { path = "../../libbpf-cargo" }
+libbpf-cargo = { path = "../../libbpf-cargo", default-features = false }
 vmlinux = { path = "../../vmlinux" }

 [dependencies]
 anyhow = "1.0.4"
-libbpf-rs = { path = "../../libbpf-rs" }
+libbpf-rs = { path = "../../libbpf-rs", default-features = false }
 libc = "0.2"
 phf = { version = "0.11", features = ["macros"] }
 plain = "0.2"

and run:

$ cargo build
$ sudo ../target/debug/capable
libbpf: bpf_object_open_opts has non-zero extra bytes
libbpf: failed to initialize skeleton BPF object 'capable_bpf': -22
Error: Invalid argument (os error 22)
$ rpm -qa libbpf
libbpf-1.2.0-3.fc40.x86_64
danielocfb commented 3 months ago

I bisected and this hasn't worked since 4d6ae7369592fc0e61d8f83db916ae6a54e46b02 and doesn't seem specific to 0.24.

danielocfb commented 3 months ago

Here is a memory dump of open_opts right before the bpf_object__open_skeleton call:

(gdb) x/90bx &open_opts
0x7fffffffd260: 0x58    0x00    0x00    0x00    0x00    0x00    0x00    0x00
0x7fffffffd268: 0x00    0x00    0x00    0x00    0x00    0x00    0x00    0x00
0x7fffffffd270: 0x00    0x00    0x00    0x00    0x00    0x00    0x00    0x00
0x7fffffffd278: 0x00    0x00    0x00    0x00    0x00    0x00    0x00    0x00
0x7fffffffd280: 0x00    0x00    0x00    0x00    0x00    0x00    0x00    0x00
0x7fffffffd288: 0x00    0x00    0x00    0x00    0x00    0x00    0x00    0x00
0x7fffffffd290: 0x00    0x00    0x00    0x00    0x00    0x00    0x00    0x00
0x7fffffffd298: 0x00    0x00    0x00    0x00    0x00    0x00    0x00    0x00
0x7fffffffd2a0: 0x00    0x00    0x00    0x00    0x00    0x00    0x00    0x00
0x7fffffffd2a8: 0x00    0x00    0x00    0x00    0x00    0x00    0x00    0x00
0x7fffffffd2b0: 0x00    0x00    0x00    0x00    0x00    0x00    0x00    0x00
0x7fffffffd2b8: 0x6a    0x01

There are no non-zero bytes in the first 88 bytes. 0x6a is at byte 89 (and doesn't belong to open_opts).

danielocfb commented 3 months ago

With the libbpf-sys version bump a new member was added to bpf_object_open_opts:

$ git diff 75042c653ee956b8c262e41ca4bcfcb0e2c461a1..1.4.0+v1.4.0 src/bindings.rs
@@ -5975,6 +6113,7 @@ pub struct bpf_object_open_opts {
     pub kernel_log_size: size_t,
     pub kernel_log_level: __u32,
     pub __bindgen_padding_2: [u8; 4usize],
+    pub bpf_token_path: *const ::std::os::raw::c_char,
 }
 impl Default for bpf_object_open_opts {
     fn default() -> Self {
danielocfb commented 3 months ago

Sounds like a libbpf bug to me.

danielocfb commented 3 months ago

I checked libbpf 1.1.0, which is what I have installed. It contains:

int bpf_object__open_skeleton(struct bpf_object_skeleton *s,
                  const struct bpf_object_open_opts *opts)
{
    DECLARE_LIBBPF_OPTS(bpf_object_open_opts, skel_opts,
        .object_name = s->name,
    );
    struct bpf_object *obj;
    int err;

    /* Attempt to preserve opts->object_name, unless overriden by user
     * explicitly. Overwriting object name for skeletons is discouraged,
     * as it breaks global data maps, because they contain object name
     * prefix as their own map name prefix. When skeleton is generated,
     * bpftool is making an assumption that this name will stay the same.
     */
    if (opts) {
        memcpy(&skel_opts, opts, sizeof(*opts));
        if (!opts->object_name)
            skel_opts.object_name = s->name;
    }

This logic seems broken to me. We copy sizeof(*opts) bytes, when really we should be copying as many bytes as the object specifies its size to be, from what I understand. Of course it can't do that without heap allocating, unless it wants to risk buffer overflow.

cc @anakryiko

danielocfb commented 3 months ago

I think the best way to fix this issue may be to move the opts validity check out of bpf_object_open and into the respective callers. This way, it can be done before the memcpy.

danielocfb commented 3 months ago

I spoke with Andrii and he acknowledged the bug and is working on a fix and will cut new releases. It will take time until distributions adopt them, though. Meanwhile, we can work around the issue by passing in a NULL pointer instead of an opts object in the common case.

d-e-s-o commented 3 months ago

Here is libbpf upstream fix: https://lore.kernel.org/bpf/20240827203721.1145494-1-andrii@kernel.org/T/#u