iovisor / bcc

BCC - Tools for BPF-based Linux IO analysis, networking, monitoring, and more
Apache License 2.0
20.58k stars 3.88k forks source link

FD to pathname #237

Open brendangregg opened 9 years ago

brendangregg commented 9 years ago

Would like a macro or function for mapping a file descriptor to a pathname. I was trying something like (this may be wrong):

#include <uapi/linux/ptrace.h>
#include <linux/sched.h>
#include <linux/fdtable.h>

int kprobe__vfs_fstat(struct pt_regs *ctx, unsigned int fd)
{
        struct file *file = (struct file *)&current->files->fdt[fd];
        bpf_trace_printk("fstat file %d %s\\n", fd, file->f_path.dentry->d_iname);
        return 0;
}

and got:

error: couldn't allocate output register for constraint 'r' at line 2148418561
4ast commented 9 years ago

accessing 'current' like this is the problem. One really ugly hack would be to cat /proc/kallsyms |grep current_task and bpf_probe_read that binary address to get 'current', but it's too ugly... may be easier to remember fd->name association in bpf program attached to sys_open, then fstat can read it from the map ? I guess we need a helper for current anyway.

brendangregg commented 9 years ago

David Smith has put some work into this problem for SystemTap: https://sourceware.org/bugzilla/show_bug.cgi?id=17920

4ast commented 9 years ago

it's pretty much doing current->files->fdt[fd]->f_path.dentry->d_iname with few useless spin_locks and refcnts. Few helper functions for pieces of this sequence could have been used to avoid exposing current() as a single helper, but current may be necessary for other cases, so yes, bpf_get_current() is pretty high on my todo list. Just need to get big ticket items resolved first.

4ast commented 8 years ago

though bpf_get_current_task() is now available, fd to pathname may be too complicated to do via probe_reads? Do we still need a separate helper for it ?

brendangregg commented 8 years ago

Yes, I tried using bpf_get_current_task() but it gets pretty horrible. I got as far as this and it still wasn't working (no file output; not populated on entry to stat()?):

#!/usr/bin/env python

from bcc import BPF

# define BPF program
prog = """
#include <uapi/linux/ptrace.h>
#include <linux/sched.h>
#include <linux/fdtable.h>

int kprobe__vfs_fstat(struct pt_regs *ctx, unsigned int fd)
{
    struct files_struct *files = NULL;
    struct fdtable *fdt = NULL;
    struct file *f = NULL;
    struct dentry *de = NULL;
    struct qstr dn = {};
    struct task_struct *curr = (struct task_struct *)bpf_get_current_task();
    bpf_probe_read(&files, sizeof(files), &curr->files);
    bpf_probe_read(&fdt, sizeof(fdt), &files->fdt);
    bpf_probe_read(&f, sizeof(f), &fdt[fd]);
    bpf_probe_read(&de, sizeof(de), &f->f_path.dentry);
    bpf_probe_read(&dn, sizeof(dn), &de->d_name);
    bpf_trace_printk("fstat fd=%d file=%s\\n", fd, dn.name);
    return 0;
}
"""

# load BPF program
b = BPF(text=prog)

# header
print("%-18s %-16s %-6s %s" % ("TIME(s)", "COMM", "PID", "MESSAGE"))

# format output
while 1:
    try:
        (task, pid, cpu, flags, ts, msg) = b.trace_fields()
    except ValueError:
        continue
    print("%-18.9f %-16s %-6d %s" % (ts, task, pid, msg))

If we can make it work, then at least we can see what the current state is...

4ast commented 8 years ago

managed to make it work:

bpf_probe_read(&f, sizeof(f), &fdt[fd]);

should be

struct file **_fd = NULL;
...
bpf_probe_read(&_fd, sizeof(_fd), &fdt->fd);
bpf_probe_read(&f, sizeof(f), &_fd[fd]);
...
bpf_trace_printk("fstat name1=%s\\n", de->d_iname);
bpf_trace_printk("fstat name2=%s\\n", dn.name);

tried few tests... looks like both short and full name are populated, so dn.name is probably good enough always.

brendangregg commented 8 years ago

Awesome, thanks! So I'm starting to believe that d_iname isn't reliable:

7538432.717158000  lsb_release      15415  fstat fd=3     dn.name=dist-packages
7538432.717160000  lsb_release      15415  fstat fd=3 de->d_iname=dist-packages
7538432.717340000  lsb_release      15415  fstat fd=3     dn.name=apport_python_hook.cpython-35.pyc
7538432.717342000  lsb_release      15415  fstat fd=3 de->d_iname=p"??????t
7538432.717354000  lsb_release      15415  fstat fd=3     dn.name=apport_python_hook.cpython-35.pyc
7538432.717356000  lsb_release      15415  fstat fd=3 de->d_iname=p"??????t

So I'm going to have to fix some of the exsting *slower tools to go use dn.name instead.

Maybe I'll write a tool that uses this code (statsnoop?), and put the bpf_probe_read()s in an fd2path() static function. I suppose we could eventually move it to somewhere like src/cc/export/helpers.h, and provide this functionality in bcc, at least to start with.

brendangregg commented 5 years ago

We've discussed this a number of times. Using the d_iname or d_name only shows the filename. We want a helper to show the full absolute path. e.g., for a FD to pathname helper, we want "/usr/local/bin/bash" and not "bash".

This can be done in at least one of two ways:

A) adding a BPF kernel helper for this function. We've suggested/discussed this at plumber's etc. B) using the new bounded loops in 5.2, writing a BCC helper that uses loops to construct the path.

ethercflow commented 5 years ago

I have implemented this by A) and committed to http://patchwork.ozlabs.org/patch/1179287/ PTAL

yonghong-song commented 5 years ago

Thanks @ethercflow Let us continue the discussion in the mailing list.

thedracle commented 5 years ago

I managed to do this by following the dentries structure with the following function:

static int read_dentry_strings(
    struct dentry *dtryp, char buf[DEFAULT_SUB_BUF_LEN][DEFAULT_SUB_BUF_SIZE]) {
    struct dentry dtry;
    struct dentry *lastdtryp = dtryp;
    int nread = 0;
    int i = 0;
    if (buf) {
        bpf_probe_read(&dtry, sizeof(struct dentry), dtryp);
        bpf_probe_read_str(buf[i], DEFAULT_SUB_BUF_SIZE, dtry.d_name.name);
        nread++;
        for (i = 1; i < DEFAULT_SUB_BUF_LEN; i++) {
            if (dtry.d_parent != lastdtryp) {
                lastdtryp = dtry.d_parent;
                bpf_probe_read(&dtry, sizeof(struct dentry), dtry.d_parent);
                bpf_probe_read_str(buf[i], DEFAULT_SUB_BUF_SIZE, dtry.d_name.name);
                nread++;
            } else
                break;
        }
    }
    return nread;
}

This will get the full path and place an entry per dentry into buf, following up to root on the respective mount.

You can use the same thing to get the mount path via dentries->filp.f_path.mnt, but the mount is hidden inside a wrapping structure on the vfsmount, so you have to use 'container_of_in' to get the wrapping structure, then you can use read_dentries on rmount.mnt_mountpoint to reconstruct the full path.

I just pass the data up into userspace separated into this array structure, and then reconstruct it there.

yzgyyang commented 3 years ago

I have implemented this by A) and committed to http://patchwork.ozlabs.org/patch/1179287/ PTAL

@ethercflow @yonghong-song From https://lore.kernel.org/netdev/c27d3cc2-f846-8aa9-10fd-c2940e7605d1@iogearbox.net/#t, I'm curious if this is still stuck on review/waiting for anyone? Would love to see this merged. :)

yonghong-song commented 3 years ago

On Mon, Mar 8, 2021 at 11:11 AM Guangyuan Yang notifications@github.com wrote:

I have implemented this by A) and committed to http://patchwork.ozlabs.org/patch/1179287/ PTAL

@ethercflow https://github.com/ethercflow @yonghong-song https://github.com/yonghong-song From https://lore.kernel.org/netdev/c27d3cc2-f846-8aa9-10fd-c2940e7605d1@iogearbox.net/#t, I'm curious if this is still stuck on review/waiting for anyone? Would love to see this merged. :)

A similar helper has been merged:

It takes a "path" (powered by btf) instead of fd, so it won't be available to kprobe, etc. but it is available to kfunc. The use case is in many cases, you actually have "struct file *" from which you can get "path" and feed it into the helper.

Does this work for you?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/iovisor/bcc/issues/237#issuecomment-793002171, or unsubscribe https://github.com/notifications/unsubscribe-auth/AC5LVIUBE36LDSR5G444QU3TCUOOJANCNFSM4BQCLHNA .

yzgyyang commented 3 years ago

It takes a "path" (powered by btf) instead of fd, so it won't be available to kprobe, etc. but it is available to kfunc. The use case is in many cases, you actually have "struct file *" from which you can get "path" and feed it into the helper.

@yonghong-song Thanks for the reply! This is not particularly useful for our use case, since we want to use kprobes. I have, though, took @thedracle 's idea above and developed the full path functionality in a reverse-dentry-lookup way for now - will open a PR shortly for this.

yonghong-song commented 3 years ago

Sounds good. Thanks!

Sherlock-Holo commented 2 years ago

On Mon, Mar 8, 2021 at 11:11 AM Guangyuan Yang @.> wrote: I have implemented this by A) and committed to http://patchwork.ozlabs.org/patch/1179287/ PTAL @ethercflow https://github.com/ethercflow @yonghong-song https://github.com/yonghong-song From @./#t, I'm curious if this is still stuck on review/waiting for anyone? Would love to see this merged. :) A similar helper has been merged: long bpf_d_path(struct path path, char buf, u32 sz) Description Return full path for given struct path object, which needs to be the kernel BTF path object. The path is returned in the provided buffer buf of size sz and is zero terminated. Return On success, the strictly positive length of the string, including the trailing NUL character. On error, a negative value. It takes a "path" (powered by btf) instead of fd, so it won't be available to kprobe, etc. but it is available to kfunc. The use case is in many cases, you actually have "struct file " from which you can get "path" and feed it into the helper. Does this work for you? — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#237 (comment)>, or unsubscribe https://github.com/notifications/unsubscribe-auth/AC5LVIUBE36LDSR5G444QU3TCUOOJANCNFSM4BQCLHNA .

is there an another way to get pathname in kprobe? we use kprobe and want to get the pathname too

vimalk78 commented 1 year ago

don't we need to use files_fdtable(current->files) from include/linux/fdtable.h to get the fdt ? this macro uses rcu_read_lock/rcu_read_unlock for mamory barrier requirements.

isubuz commented 1 year ago

I managed to do this by following the dentries structure with the following function:

static int read_dentry_strings(
    struct dentry *dtryp, char buf[DEFAULT_SUB_BUF_LEN][DEFAULT_SUB_BUF_SIZE]) {
    struct dentry dtry;
    struct dentry *lastdtryp = dtryp;
    int nread = 0;
    int i = 0;
    if (buf) {
        bpf_probe_read(&dtry, sizeof(struct dentry), dtryp);
        bpf_probe_read_str(buf[i], DEFAULT_SUB_BUF_SIZE, dtry.d_name.name);
        nread++;
        for (i = 1; i < DEFAULT_SUB_BUF_LEN; i++) {
            if (dtry.d_parent != lastdtryp) {
                lastdtryp = dtry.d_parent;
                bpf_probe_read(&dtry, sizeof(struct dentry), dtry.d_parent);
                bpf_probe_read_str(buf[i], DEFAULT_SUB_BUF_SIZE, dtry.d_name.name);
                nread++;
            } else
                break;
        }
    }
    return nread;
}

This will get the full path and place an entry per dentry into buf, following up to root on the respective mount.

You can use the same thing to get the mount path via dentries->filp.f_path.mnt, but the mount is hidden inside a wrapping structure on the vfsmount, so you have to use 'container_of_in' to get the wrapping structure, then you can use read_dentries on rmount.mnt_mountpoint to reconstruct the full path.

I just pass the data up into userspace separated into this array structure, and then reconstruct it there.

@thedracle could you pls post the complete snippet of how you use the 2d array in the userspace code and what data structure you use to pass the info. I tried with a BPF_HASH struct containing the 2d array, but keep getting a seg fault error.

thedracle commented 1 year ago

It's been a long time since I've looked at this, but it was something like:

        #define DEFAULT_SUB_BUF_SIZE 255  // Max filename length in Linux.
        #define DEFAULT_SUB_BUF_LEN 16

        stringstream path;

        for (int i = bpf_event->nread - 1; i >= 0; i--) {
          if (strncmp(bpf_event->buffer[i], "/", DEFAULT_SUB_BUF_LEN) == 0) {
            path << "/";
          } else {
            path << bpf_event->buffer[i];
            if (i != 0) {
              path << "/";
            }
          }
        }

The one thing that was missing from above is the code to look up and append the mount point path too, which is very similar.

I.E: A file that isn't from "/" but from a different mount needs to have the path for the mount point constructed and appended to the full file path.

I'm surprised this hasn't been generally solved after all of this time. I've gotten sucked away into other areas of programming, but if this is actually generally useful, I could put together a code snippet or demo that performs path reconstruction for those interested.

isubuz commented 1 year ago

@thedracle Many thanks for the quick response. Will try to adapt my code based on this.

This is still not generally solved as I have been looking for over a week to find a solution that works. As mentioned earlier in the thread, bpf_d_path exists, but not usable for kprobes. So a full code snippet will definitely be very very useful for folks who stumbled across the same problem!!

aktau commented 7 months ago

If one has access to a file struct, it can (nowadays) be done with a while (or unroll) loop:

// vfsreadsnoop.bt
//
// Prints paths in reverse order because of (current) bpftrace limitations. Can
// be postprocessed to yield normal-looking paths. E.g.:
//
//     awk '{printf("%s %s %s %s ", $1, $2, $3, $4) ; for (i = NF ; i > 4 ; i--) { printf("/%s", $i) } ; print("") }'

#include <linux/fs.h>

// Use kretfunc instead of a kprobe/kretprobe combo 
// because it allows us to read args and retval at the same time.
kretfunc:vfs_read /strcontains(comm, str($1))/ {
    if (retval >= 0) {
        printf("%-16s %10d want=%-6d read=%-6d", comm, pid, args->count, retval);
        $dentry = args->file->f_path.dentry;
        // Print maximum 16 path elements. Otherwise the eBPFs verified refuses
        // to load this program. An alternative is unroll(16), but that's harder
        // to manage.
        //
        // The sentinel value for "no more parents" is not a NULL pointer, but
        // d->parent == d.
        $i = 0;
        while ($dentry->d_parent != $dentry && $i <= 16) {
          printf(" %s", str($dentry->d_name.name));
          $dentry = $dentry->d_parent;
          $i++;
        }
        print("");
    }
}

Output:

$ bpftrace ./vfsreadsnoop.bt cat
Attaching 1 probe...
cat                  300869 want=832    read=832    libc.so.6 x86_64-linux-gnu lib usr
cat                  300869 want=784    read=784    libc.so.6 x86_64-linux-gnu lib usr
cat                  300869 want=784    read=784    libc.so.6 x86_64-linux-gnu lib usr
cat                  300870 want=832    read=832    libc.so.6 x86_64-linux-gnu lib usr
cat                  300870 want=784    read=784    libc.so.6 x86_64-linux-gnu lib usr
cat                  300870 want=784    read=784    libc.so.6 x86_64-linux-gnu lib usr
cat                  300870 want=131072 read=1191   vfsreadsnoop.bt bpftrace dotfiles aktau home
cat                  300870 want=131072 read=0      vfsreadsnoop.bt bpftrace dotfiles aktau home
cat                  300910 want=832    read=832    libc.so.6 x86_64-linux-gnu lib usr
cat                  300910 want=784    read=784    libc.so.6 x86_64-linux-gnu lib usr
cat                  300910 want=784    read=784    libc.so.6 x86_64-linux-gnu lib usr
cat                  300910 want=131072 read=1649   envsnoop.bt bpftrace dotfiles aktau home
cat                  300910 want=131072 read=0      envsnoop.bt bpftrace dotfiles aktau home