yfractal / blog

10 stars 0 forks source link

eBPF USDT in Rust #15

Open yfractal opened 3 months ago

yfractal commented 3 months ago

Introduction

Many open-source software support USDT, such as Erlang, Node.js, and MySQL, as it offers several benefits.

  1. It is near-zero overhead if no program subscribes to the probe, as it is only a series of NOP instructions.
  2. Compared with uprobe, USDT is more stable and can expose more information.
  3. It decouples the generation of observability events from real observability functions.

This article introduces USDT and how to use it in Rust.

A Simple Example in C

Let’s start with a simple C example

#include <sys/sdt.h>
#include <stdio.h>
#include <unistd.h>

int main() {
    char myStr[] = "My string";
    while (1) {
        printf("looping");
        DTRACE_PROBE2(example, second_probe, myStr, 123);
        sleep(1);
    }
    return 0;
}

After compiling it through gcc example.c -o example, we can use readelf to check its definition.

readelf -n example

Displaying notes found in: .note.stapsdt
  Owner                Data size    Description
  stapsdt              0x00000037   NT_STAPSDT (SystemTap probe descriptors)
    Provider: example
    Name: a_probe
    Location: 0x0000000000401165, Base: 0x0000000000402018, Semaphore: 0x0000000000000000
    Arguments: 8@%rax -4@$123

Then we can use bpftrace to probe this program. bpftrace is a handy tool allows us to use eBPF easily. For example, bpftrace -e 'tracepoint:raw_syscalls:sys_enter { @[comm] = count(); }' prints syscall count by program.

Attaching 1 probe...
^C
@[bpftrace]: 82
@[cat]: 124
@[ls]: 242

For this problem, we need sudo bpftrace -e 'usdt:/tmp/example:second_probe { printf("probe fired arg0=%s, arg1=%d!.\n", str(arg0), arg1)!; }'

The result as below:

Screenshot 2024-05-22 at 08 37 40

A Rust Example

Basic

For rust, we can use probe-rs to insert USDT instrumentation, which is easier than writing C code.

[dependencies]
probe = "0.5"
use probe::probe;

fn main() {
    probe!(usdt, demo, "some-string".as_ptr(), 12345);
    println!("Hello, world!");
}

Then sudo bpftrace -e 'usdt:/tmp/usdt_demo/target/release/usdt_demo:demo { printf("probe fired arg0=%s, arg1=%d! \n", str(arg0), arg1); }' gives us

Attaching 1 probe...
probe fired arg0=some-stringHello, world!
, arg1=12345!

We'd like to see some-string, but it prints 'some-stringHello, world! ...'. The reason is Rust strings are not null-terminated. Therefore, the eBPF program can't determine where the string ends.

To solve this, we can convert Rust strings to C strings.

use std::ffi::CString;
use probe::probe;

fn main() {
    probe!(usdt, demo, CString::new("some-string").unwrap().as_ptr(), 12345);
    println!("Hello, world!");
}
Attaching 1 probe...
probe fired arg0=some-string, arg1=12345!

Read Struct through Python

Sometimes, we need to expose several variables through USDT. Instead of exposing them one by one, we can use a struct. We pass the struct to the eBPF program, where we define the same struct and read the data from it.

For example, the struct is defined in Rust as:

use std::os::raw::c_char;

pub struct Event {
    pub trace_id: [c_char; 32],
}

In the eBPF program, we define the same struct in C:

struct event_t {
    char trace_id[32];
};

The trace_id is a fixed-length array because, for a C program, it's difficult to read a Rust string directly. Compared with CString, the struct can be read by eBPF directly.

The construction method of Event is:

impl Event {
    pub fn new(trace_id: &str) -> Self {
        Self {
            trace_id: Self::str_to_fixed(trace_id),
        }
    }

     pub fn as_ptr(&self) -> *const Self {
        self as *const Self
    }

    fn str_to_fixed<const N: usize>(s: &str) -> [i8; N] {
        let mut array = [0i8; N];
        let bytes = s.as_bytes();

        let len = bytes.len().min(N);
        for i in 0..len {
            array[i] = bytes[i] as i8;
        }

        array
    }
}

Then, in the main function:

fn main() {
    probe!(usdt, demo, Event::new("insert").as_ptr());
    println!("Hello, world!");
}

For reading eBPF events in Python, we need bcc. It handles header files quite well and provides a user-friendly syntax, making it a good choice for writing eBPF programs.

For reading USDT, the process involves:

  1. Writing eBPF code to read data from other programs.
  2. Registering it in the kernel.
  3. Reading the buffer data and converting it into Python.

The eBPF program is as follows:

// bcc helps us complies this and we do not need to worry the header files
#include <uapi/linux/ptrace.h>
#include <linux/sched.h>

struct event_t {
    char trace_id[32];
};

BPF_PERF_OUTPUT(events);

int probe(struct pt_regs *ctx) {
    struct event_t event = {};

    // get the first argument address
    u64 event_addr = 0;
    bpf_usdt_readarg(1, ctx, &event_addr);

    // read the memory into event
    bpf_probe_read_user(&event, sizeof(event), (void *)event_addr);

    // submit to some buffer
    events.perf_submit(ctx, &event, sizeof(event));

    return 0;
}

For registering:

bpf_program = """
#include <uapi/linux/ptrace.h>
#include <linux/sched.h>

struct event_t {
...
...
"""

usdt = USDT(path=binary_path)
usdt.enable_probe(probe="demo", fn_name="probe")

# Load and attach BPF program
b = BPF(text=bpf_program, usdt_contexts=[usdt])

For polling the data:

class Data(ct.Structure):
    _fields_ = [
        ("trace_id", ct.c_char * 32),
    ]

# Callback to handle events
def print_event(cpu, data, size):
    event = ct.cast(data, ct.POINTER(Data)).contents
    print(f"trace_id: {event.trace_id}\n")

# Open perf buffer
b["events"].open_perf_buffer(print_event)

# Poll for events
while True:
    try:
        b.perf_buffer_poll()
    except KeyboardInterrupt:
        exit()

You can find all the code in this gist.

After enabling the probe and running the program, we can see the result as below:

[ec2-user@ip-172-31-27-62 tmp]$ sudo python3 probe.py /tmp/usdt_demo/target/release/usdt_demo
trace_id: b'some-trace-id'

Summary

This article introduces how to insert probes in Rust and read them through bpftrace and bcc. They are simple but could be a start for eBPF journey.

And there is an example of using USDT in a small project.

https://github.com/yfractal/ccache/pull/7/files