iovisor / bcc

BCC - Tools for BPF-based Linux IO analysis, networking, monitoring, and more
Apache License 2.0
20.34k stars 3.86k forks source link

Filename from File Descriptor Tracepoint on Sys_enter_write #2538

Open miknoj opened 4 years ago

miknoj commented 4 years ago

I'm attempting to write an eBPF program to get the pathname from a file descriptor, but I am finding it difficult to get it to work. The program hooks onto the sys_enter_write tracepoint and gets a file descriptor from the passed in args struct. Is there any guidance for this specific scenario?

Using checks after every bpf_probe_read and I was able to discover that the code appears to be failing when it attempts to read the value fdt->fd, because fdt is a null instead of the expected struct fdtable pointer.

Below is the eBPF code that is failing (with all the checks removed for clarity).

#define randomized_struct_fields_start  struct {
#define randomized_struct_fields_end    };
#include <uapi/linux/bpf.h>
#include <linux/dcache.h>
#include <linux/err.h>
#include <linux/fdtable.h>
#include <linux/fs.h> 
#include <linux/fs_struct.h>
#include <linux/path.h>
#include <linux/sched.h>
#include <linux/slab.h>

int prog (struct tracepoint__syscalls__sys_enter_write *args)
{
    unsigned int fd;
    struct task_struct* t;
    struct files_struct* f;
    struct fdtable* fdt;
    struct file** fdd;
    struct file* file;
    struct path path;
    struct dentry* dentry;
    struct qstr pathname;
    char filename[128];

    fd =args->fd;
    t = (struct task_struct*)bpf_get_current_task();
    f = t->files;

    bpf_probe_read(&fdt, sizeof(fdt), (void*)&f->fdt);
    bpf_probe_read(&fdd, sizeof(fdd), (void*)&fdt->fd); // Failing here.
    bpf_probe_read(&file, sizeof(file), (void*)&fdd[fd]);
    bpf_probe_read(&path, sizeof(path), (const void*)&file->f_path);

    dentry = path.dentry;
    bpf_probe_read(&pathname, sizeof(pathname), (const void*)&dentry->d_name);
    bpf_probe_read_str((void*)filename, sizeof(filename), (const void*)pathname.name);

    bpf_trace_printk("File: %s\n", filename);

    return 0;
}

Related #237

palmtenor commented 4 years ago

What's the return value of your bpf_probe_read(&fdt, sizeof(fdt), (void*)&f->fdt)?

miknoj commented 4 years ago

The return value of that specific probe is 0. The following probe bpf_probe_read(&fdd, sizeof(fdd), (void*)&fdt->fd); is, however, returning a -14 (EFAULT) status. This is leading me to believe that I'm accessing the fd field incorrectly. Should I be using some sort of helper function?

ethercflow commented 4 years ago
#!/usr/bin/python

from bcc import BPF

bpf_text="""
#define randomized_struct_fields_start  struct {
#define randomized_struct_fields_end    };
#include <uapi/linux/bpf.h>
#include <linux/dcache.h>
#include <linux/err.h>
#include <linux/fdtable.h>
#include <linux/fs.h>
#include <linux/fs_struct.h>
#include <linux/path.h>
#include <linux/sched.h>
#include <linux/slab.h>

TRACEPOINT_PROBE(syscalls, sys_enter_write) {
    unsigned int fd;
    struct task_struct* t;
    struct files_struct* f;
    struct fdtable* fdt;
    struct file** fdd;
    struct file* file;
    struct path path;
    struct dentry* dentry;
    struct qstr pathname;
    char filename[128];

    fd =args->fd;
    t = (struct task_struct*)bpf_get_current_task();
    f = t->files;

    bpf_probe_read(&fdt, sizeof(fdt), (void*)&f->fdt);
    int ret = bpf_probe_read(&fdd, sizeof(fdd), (void*)&fdt->fd); 
    if (ret) {
        bpf_trace_printk("bpf_probe_read failed: %d\\n", ret);
        return 0;
    }
    bpf_probe_read(&file, sizeof(file), (void*)&fdd[fd]);
    bpf_probe_read(&path, sizeof(path), (const void*)&file->f_path);

    dentry = path.dentry;
    bpf_probe_read(&pathname, sizeof(pathname), (const void*)&dentry->d_name);
    bpf_probe_read_str((void*)filename, sizeof(filename), (const void*)pathname.name);

    bpf_trace_printk("File: %s\\n", filename);

    return 0;
}
"""

b = BPF(text=bpf_text).trace_print()

@miknoj I cann't reproduce this issue with above program. My env is: OS/kernel: CentOS Linux release 7.7.1908 (Core)/3.10.0-1062.1.1.el7.x86_64 bcc version: https://github.com/iovisor/bcc/commit/0fa419a64e71984d42f107c210d3d3f0cc82d59a

What am I missing?

miknoj commented 4 years ago

@ethercflow I don't think you missed anything. I am realizing now that I am running a much older version of bcc however, v0.7.0. I'll go upgrade that, give it another shot and report back.

josalem commented 4 years ago

Hmm, I just tried this using the following env: Linux Ubuntu18 5.0.0-1022-azure #23~18.04.1-Ubuntu. All versions of the above code don't work when using v0.10.0 of bcc. It always fails with a -14 (EFAULT). I noticed that this code is what's being used in #2544, so it must be working for other people on other machines. Does this traversal only work on specific kernels or with specific kernel flags turned on? Below are the BPF related flags turned on for my env:

CONFIG_CGROUP_BPF=y
CONFIG_BPF=y
CONFIG_BPF_SYSCALL=y
CONFIG_BPF_JIT_ALWAYS_ON=y
CONFIG_IPV6_SEG6_BPF=y
CONFIG_NETFILTER_XT_MATCH_BPF=m
CONFIG_BPFILTER=y
CONFIG_BPFILTER_UMH=m
CONFIG_NET_CLS_BPF=m
CONFIG_NET_ACT_BPF=m
CONFIG_BPF_JIT=y
CONFIG_BPF_STREAM_PARSER=y
CONFIG_LWTUNNEL_BPF=y
CONFIG_HAVE_EBPF_JIT=y
CONFIG_BPF_EVENTS=y
CONFIG_BPF_KPROBE_OVERRIDE=y
ethercflow commented 4 years ago

Thank you @josalem I tried this using Linux Ubuntu18 5.3.1-050301-generic with latest bcc, found It always fails with a -14 (EFAULT). I'll try to find the reason.

yonghong-song commented 4 years ago

I tried this example on local server as well and I also see most of the failure in this one

    int ret = bpf_probe_read(&fdd, sizeof(fdd), (void*)&fdt->fd); 

I printed address of &f->fdt and &fdt->fd. e.g., &f->fdt is ffff889c60705ae0, and &fdt->fd is as ffffc90036047da0.

Based on x64 address mapping, https://www.kernel.org/doc/Documentation/x86/x86_64/mm.txt we have

 ffff888000000000 | -119.5  TB | ffffc87fffffffff |   64 TB | direct mapping of all physical memory (page_offset_base)
 ffffc90000000000 |  -55    TB | ffffe8ffffffffff |   32 TB | vmalloc/ioremap space (vmalloc_base)

The formal is a direct mapping of all physical memory so there won't be any page fault. The latter is vmalloc area which may not be physically contiguous and may have page fault.

So bpf_probe_read() is doing right thing here. Unfortunately this seems preventing us to get the filename inside the bpf program.

To resolve this (esp. relating to accessing vmalloc areas), a bpf helper might be a more eliable way to do the work unless someday bpf program itself allowed to take faults.

miknoj commented 4 years ago

Hey @yonghong-song I love to put some work on resolving this and would be more than glad to work on a bpf helper to do so. However, I'm not too sure on how to start. Would you be willing to advise?

yonghong-song commented 4 years ago

@miknoj @ethercflow has submitted a patch to implement a fd2path helper. The patch is posted here https://lore.kernel.org/netdev/c6bf920a-845e-b7f5-ec47-a1e97b806427@fb.com/T/#t feel free to take a look and comment.

miknoj commented 4 years ago

@miknoj @ethercflow This is fantastic. I will attempt to apply this patch and give it a shot.