Open miknoj opened 4 years ago
What's the return value of your bpf_probe_read(&fdt, sizeof(fdt), (void*)&f->fdt)
?
The return value of that specific probe is 0.
The following probe bpf_probe_read(&fdd, sizeof(fdd), (void*)&fdt->fd);
is, however, returning a -14 (EFAULT) status.
This is leading me to believe that I'm accessing the fd field incorrectly. Should I be using some sort of helper function?
#!/usr/bin/python
from bcc import BPF
bpf_text="""
#define randomized_struct_fields_start struct {
#define randomized_struct_fields_end };
#include <uapi/linux/bpf.h>
#include <linux/dcache.h>
#include <linux/err.h>
#include <linux/fdtable.h>
#include <linux/fs.h>
#include <linux/fs_struct.h>
#include <linux/path.h>
#include <linux/sched.h>
#include <linux/slab.h>
TRACEPOINT_PROBE(syscalls, sys_enter_write) {
unsigned int fd;
struct task_struct* t;
struct files_struct* f;
struct fdtable* fdt;
struct file** fdd;
struct file* file;
struct path path;
struct dentry* dentry;
struct qstr pathname;
char filename[128];
fd =args->fd;
t = (struct task_struct*)bpf_get_current_task();
f = t->files;
bpf_probe_read(&fdt, sizeof(fdt), (void*)&f->fdt);
int ret = bpf_probe_read(&fdd, sizeof(fdd), (void*)&fdt->fd);
if (ret) {
bpf_trace_printk("bpf_probe_read failed: %d\\n", ret);
return 0;
}
bpf_probe_read(&file, sizeof(file), (void*)&fdd[fd]);
bpf_probe_read(&path, sizeof(path), (const void*)&file->f_path);
dentry = path.dentry;
bpf_probe_read(&pathname, sizeof(pathname), (const void*)&dentry->d_name);
bpf_probe_read_str((void*)filename, sizeof(filename), (const void*)pathname.name);
bpf_trace_printk("File: %s\\n", filename);
return 0;
}
"""
b = BPF(text=bpf_text).trace_print()
@miknoj I cann't reproduce this issue with above program. My env is: OS/kernel: CentOS Linux release 7.7.1908 (Core)/3.10.0-1062.1.1.el7.x86_64 bcc version: https://github.com/iovisor/bcc/commit/0fa419a64e71984d42f107c210d3d3f0cc82d59a
What am I missing?
@ethercflow I don't think you missed anything. I am realizing now that I am running a much older version of bcc however, v0.7.0. I'll go upgrade that, give it another shot and report back.
Hmm, I just tried this using the following env: Linux Ubuntu18 5.0.0-1022-azure #23~18.04.1-Ubuntu
. All versions of the above code don't work when using v0.10.0 of bcc. It always fails with a -14 (EFAULT)
. I noticed that this code is what's being used in #2544, so it must be working for other people on other machines. Does this traversal only work on specific kernels or with specific kernel flags turned on? Below are the BPF related flags turned on for my env:
CONFIG_CGROUP_BPF=y
CONFIG_BPF=y
CONFIG_BPF_SYSCALL=y
CONFIG_BPF_JIT_ALWAYS_ON=y
CONFIG_IPV6_SEG6_BPF=y
CONFIG_NETFILTER_XT_MATCH_BPF=m
CONFIG_BPFILTER=y
CONFIG_BPFILTER_UMH=m
CONFIG_NET_CLS_BPF=m
CONFIG_NET_ACT_BPF=m
CONFIG_BPF_JIT=y
CONFIG_BPF_STREAM_PARSER=y
CONFIG_LWTUNNEL_BPF=y
CONFIG_HAVE_EBPF_JIT=y
CONFIG_BPF_EVENTS=y
CONFIG_BPF_KPROBE_OVERRIDE=y
Thank you @josalem I tried this using Linux Ubuntu18 5.3.1-050301-generic
with latest bcc, found It always fails with a -14 (EFAULT)
. I'll try to find the reason.
I tried this example on local server as well and I also see most of the failure in this one
int ret = bpf_probe_read(&fdd, sizeof(fdd), (void*)&fdt->fd);
I printed address of &f->fdt
and &fdt->fd
. e.g., &f->fdt
is ffff889c60705ae0, and
&fdt->fd
is as ffffc90036047da0.
Based on x64 address mapping, https://www.kernel.org/doc/Documentation/x86/x86_64/mm.txt we have
ffff888000000000 | -119.5 TB | ffffc87fffffffff | 64 TB | direct mapping of all physical memory (page_offset_base)
ffffc90000000000 | -55 TB | ffffe8ffffffffff | 32 TB | vmalloc/ioremap space (vmalloc_base)
The formal is a direct mapping of all physical memory so there won't be any page fault. The latter is vmalloc area which may not be physically contiguous and may have page fault.
So bpf_probe_read()
is doing right thing here. Unfortunately this seems preventing us
to get the filename inside the bpf program.
To resolve this (esp. relating to accessing vmalloc areas), a bpf helper might be a more eliable way to do the work unless someday bpf program itself allowed to take faults.
Hey @yonghong-song I love to put some work on resolving this and would be more than glad to work on a bpf helper to do so. However, I'm not too sure on how to start. Would you be willing to advise?
@miknoj @ethercflow has submitted a patch to implement a fd2path
helper. The patch is posted here
https://lore.kernel.org/netdev/c6bf920a-845e-b7f5-ec47-a1e97b806427@fb.com/T/#t
feel free to take a look and comment.
@miknoj @ethercflow This is fantastic. I will attempt to apply this patch and give it a shot.
I'm attempting to write an eBPF program to get the pathname from a file descriptor, but I am finding it difficult to get it to work. The program hooks onto the sys_enter_write tracepoint and gets a file descriptor from the passed in args struct. Is there any guidance for this specific scenario?
Using checks after every bpf_probe_read and I was able to discover that the code appears to be failing when it attempts to read the value fdt->fd, because fdt is a null instead of the expected struct fdtable pointer.
Below is the eBPF code that is failing (with all the checks removed for clarity).
Related #237