Open brendangregg opened 9 years ago
accessing 'current' like this is the problem. One really ugly hack would be to cat /proc/kallsyms |grep current_task and bpf_probe_read that binary address to get 'current', but it's too ugly... may be easier to remember fd->name association in bpf program attached to sys_open, then fstat can read it from the map ? I guess we need a helper for current anyway.
David Smith has put some work into this problem for SystemTap: https://sourceware.org/bugzilla/show_bug.cgi?id=17920
it's pretty much doing current->files->fdt[fd]->f_path.dentry->d_iname with few useless spin_locks and refcnts. Few helper functions for pieces of this sequence could have been used to avoid exposing current() as a single helper, but current may be necessary for other cases, so yes, bpf_get_current() is pretty high on my todo list. Just need to get big ticket items resolved first.
though bpf_get_current_task() is now available, fd to pathname may be too complicated to do via probe_reads? Do we still need a separate helper for it ?
Yes, I tried using bpf_get_current_task() but it gets pretty horrible. I got as far as this and it still wasn't working (no file output; not populated on entry to stat()?):
#!/usr/bin/env python
from bcc import BPF
# define BPF program
prog = """
#include <uapi/linux/ptrace.h>
#include <linux/sched.h>
#include <linux/fdtable.h>
int kprobe__vfs_fstat(struct pt_regs *ctx, unsigned int fd)
{
struct files_struct *files = NULL;
struct fdtable *fdt = NULL;
struct file *f = NULL;
struct dentry *de = NULL;
struct qstr dn = {};
struct task_struct *curr = (struct task_struct *)bpf_get_current_task();
bpf_probe_read(&files, sizeof(files), &curr->files);
bpf_probe_read(&fdt, sizeof(fdt), &files->fdt);
bpf_probe_read(&f, sizeof(f), &fdt[fd]);
bpf_probe_read(&de, sizeof(de), &f->f_path.dentry);
bpf_probe_read(&dn, sizeof(dn), &de->d_name);
bpf_trace_printk("fstat fd=%d file=%s\\n", fd, dn.name);
return 0;
}
"""
# load BPF program
b = BPF(text=prog)
# header
print("%-18s %-16s %-6s %s" % ("TIME(s)", "COMM", "PID", "MESSAGE"))
# format output
while 1:
try:
(task, pid, cpu, flags, ts, msg) = b.trace_fields()
except ValueError:
continue
print("%-18.9f %-16s %-6d %s" % (ts, task, pid, msg))
If we can make it work, then at least we can see what the current state is...
managed to make it work:
bpf_probe_read(&f, sizeof(f), &fdt[fd]);
should be
struct file **_fd = NULL;
...
bpf_probe_read(&_fd, sizeof(_fd), &fdt->fd);
bpf_probe_read(&f, sizeof(f), &_fd[fd]);
...
bpf_trace_printk("fstat name1=%s\\n", de->d_iname);
bpf_trace_printk("fstat name2=%s\\n", dn.name);
tried few tests... looks like both short and full name are populated, so dn.name is probably good enough always.
Awesome, thanks! So I'm starting to believe that d_iname isn't reliable:
7538432.717158000 lsb_release 15415 fstat fd=3 dn.name=dist-packages
7538432.717160000 lsb_release 15415 fstat fd=3 de->d_iname=dist-packages
7538432.717340000 lsb_release 15415 fstat fd=3 dn.name=apport_python_hook.cpython-35.pyc
7538432.717342000 lsb_release 15415 fstat fd=3 de->d_iname=p"??????t
7538432.717354000 lsb_release 15415 fstat fd=3 dn.name=apport_python_hook.cpython-35.pyc
7538432.717356000 lsb_release 15415 fstat fd=3 de->d_iname=p"??????t
So I'm going to have to fix some of the exsting *slower tools to go use dn.name instead.
Maybe I'll write a tool that uses this code (statsnoop?), and put the bpf_probe_read()s in an fd2path() static function. I suppose we could eventually move it to somewhere like src/cc/export/helpers.h, and provide this functionality in bcc, at least to start with.
We've discussed this a number of times. Using the d_iname or d_name only shows the filename. We want a helper to show the full absolute path. e.g., for a FD to pathname helper, we want "/usr/local/bin/bash" and not "bash".
This can be done in at least one of two ways:
A) adding a BPF kernel helper for this function. We've suggested/discussed this at plumber's etc. B) using the new bounded loops in 5.2, writing a BCC helper that uses loops to construct the path.
I have implemented this by A) and committed to http://patchwork.ozlabs.org/patch/1179287/ PTAL
Thanks @ethercflow Let us continue the discussion in the mailing list.
I managed to do this by following the dentries structure with the following function:
static int read_dentry_strings(
struct dentry *dtryp, char buf[DEFAULT_SUB_BUF_LEN][DEFAULT_SUB_BUF_SIZE]) {
struct dentry dtry;
struct dentry *lastdtryp = dtryp;
int nread = 0;
int i = 0;
if (buf) {
bpf_probe_read(&dtry, sizeof(struct dentry), dtryp);
bpf_probe_read_str(buf[i], DEFAULT_SUB_BUF_SIZE, dtry.d_name.name);
nread++;
for (i = 1; i < DEFAULT_SUB_BUF_LEN; i++) {
if (dtry.d_parent != lastdtryp) {
lastdtryp = dtry.d_parent;
bpf_probe_read(&dtry, sizeof(struct dentry), dtry.d_parent);
bpf_probe_read_str(buf[i], DEFAULT_SUB_BUF_SIZE, dtry.d_name.name);
nread++;
} else
break;
}
}
return nread;
}
This will get the full path and place an entry per dentry into buf, following up to root on the respective mount.
You can use the same thing to get the mount path via dentries->filp.f_path.mnt, but the mount is hidden inside a wrapping structure on the vfsmount, so you have to use 'container_of_in' to get the wrapping structure, then you can use read_dentries on rmount.mnt_mountpoint to reconstruct the full path.
I just pass the data up into userspace separated into this array structure, and then reconstruct it there.
I have implemented this by A) and committed to http://patchwork.ozlabs.org/patch/1179287/ PTAL
@ethercflow @yonghong-song From https://lore.kernel.org/netdev/c27d3cc2-f846-8aa9-10fd-c2940e7605d1@iogearbox.net/#t, I'm curious if this is still stuck on review/waiting for anyone? Would love to see this merged. :)
On Mon, Mar 8, 2021 at 11:11 AM Guangyuan Yang notifications@github.com wrote:
I have implemented this by A) and committed to http://patchwork.ozlabs.org/patch/1179287/ PTAL
@ethercflow https://github.com/ethercflow @yonghong-song https://github.com/yonghong-song From https://lore.kernel.org/netdev/c27d3cc2-f846-8aa9-10fd-c2940e7605d1@iogearbox.net/#t, I'm curious if this is still stuck on review/waiting for anyone? Would love to see this merged. :)
A similar helper has been merged:
It takes a "path" (powered by btf) instead of fd, so it won't be available to kprobe, etc. but it is available to kfunc. The use case is in many cases, you actually have "struct file *" from which you can get "path" and feed it into the helper.
Does this work for you?
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/iovisor/bcc/issues/237#issuecomment-793002171, or unsubscribe https://github.com/notifications/unsubscribe-auth/AC5LVIUBE36LDSR5G444QU3TCUOOJANCNFSM4BQCLHNA .
It takes a "path" (powered by btf) instead of fd, so it won't be available to kprobe, etc. but it is available to kfunc. The use case is in many cases, you actually have "struct file *" from which you can get "path" and feed it into the helper.
@yonghong-song Thanks for the reply! This is not particularly useful for our use case, since we want to use kprobes. I have, though, took @thedracle 's idea above and developed the full path functionality in a reverse-dentry-lookup way for now - will open a PR shortly for this.
Sounds good. Thanks!
On Mon, Mar 8, 2021 at 11:11 AM Guangyuan Yang @.> wrote: I have implemented this by A) and committed to http://patchwork.ozlabs.org/patch/1179287/ PTAL @ethercflow https://github.com/ethercflow @yonghong-song https://github.com/yonghong-song From @./#t, I'm curious if this is still stuck on review/waiting for anyone? Would love to see this merged. :) A similar helper has been merged: long bpf_d_path(struct path path, char buf, u32 sz) Description Return full path for given struct path object, which needs to be the kernel BTF path object. The path is returned in the provided buffer buf of size sz and is zero terminated. Return On success, the strictly positive length of the string, including the trailing NUL character. On error, a negative value. It takes a "path" (powered by btf) instead of fd, so it won't be available to kprobe, etc. but it is available to kfunc. The use case is in many cases, you actually have "struct file " from which you can get "path" and feed it into the helper. Does this work for you? … — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#237 (comment)>, or unsubscribe https://github.com/notifications/unsubscribe-auth/AC5LVIUBE36LDSR5G444QU3TCUOOJANCNFSM4BQCLHNA .
is there an another way to get pathname in kprobe? we use kprobe and want to get the pathname too
don't we need to use files_fdtable(current->files)
from include/linux/fdtable.h
to get the fdt
?
this macro uses rcu_read_lock/rcu_read_unlock
for mamory barrier requirements.
I managed to do this by following the dentries structure with the following function:
static int read_dentry_strings( struct dentry *dtryp, char buf[DEFAULT_SUB_BUF_LEN][DEFAULT_SUB_BUF_SIZE]) { struct dentry dtry; struct dentry *lastdtryp = dtryp; int nread = 0; int i = 0; if (buf) { bpf_probe_read(&dtry, sizeof(struct dentry), dtryp); bpf_probe_read_str(buf[i], DEFAULT_SUB_BUF_SIZE, dtry.d_name.name); nread++; for (i = 1; i < DEFAULT_SUB_BUF_LEN; i++) { if (dtry.d_parent != lastdtryp) { lastdtryp = dtry.d_parent; bpf_probe_read(&dtry, sizeof(struct dentry), dtry.d_parent); bpf_probe_read_str(buf[i], DEFAULT_SUB_BUF_SIZE, dtry.d_name.name); nread++; } else break; } } return nread; }
This will get the full path and place an entry per dentry into buf, following up to root on the respective mount.
You can use the same thing to get the mount path via dentries->filp.f_path.mnt, but the mount is hidden inside a wrapping structure on the vfsmount, so you have to use 'container_of_in' to get the wrapping structure, then you can use read_dentries on rmount.mnt_mountpoint to reconstruct the full path.
I just pass the data up into userspace separated into this array structure, and then reconstruct it there.
@thedracle could you pls post the complete snippet of how you use the 2d array in the userspace code and what data structure you use to pass the info. I tried with a BPF_HASH struct containing the 2d array, but keep getting a seg fault error.
It's been a long time since I've looked at this, but it was something like:
#define DEFAULT_SUB_BUF_SIZE 255 // Max filename length in Linux.
#define DEFAULT_SUB_BUF_LEN 16
stringstream path;
for (int i = bpf_event->nread - 1; i >= 0; i--) {
if (strncmp(bpf_event->buffer[i], "/", DEFAULT_SUB_BUF_LEN) == 0) {
path << "/";
} else {
path << bpf_event->buffer[i];
if (i != 0) {
path << "/";
}
}
}
The one thing that was missing from above is the code to look up and append the mount point path too, which is very similar.
I.E: A file that isn't from "/" but from a different mount needs to have the path for the mount point constructed and appended to the full file path.
I'm surprised this hasn't been generally solved after all of this time. I've gotten sucked away into other areas of programming, but if this is actually generally useful, I could put together a code snippet or demo that performs path reconstruction for those interested.
@thedracle Many thanks for the quick response. Will try to adapt my code based on this.
This is still not generally solved as I have been looking for over a week to find a solution that works. As mentioned earlier in the thread, bpf_d_path
exists, but not usable for kprobes. So a full code snippet will definitely be very very useful for folks who stumbled across the same problem!!
If one has access to a file struct, it can (nowadays) be done with a while (or unroll) loop:
// vfsreadsnoop.bt
//
// Prints paths in reverse order because of (current) bpftrace limitations. Can
// be postprocessed to yield normal-looking paths. E.g.:
//
// awk '{printf("%s %s %s %s ", $1, $2, $3, $4) ; for (i = NF ; i > 4 ; i--) { printf("/%s", $i) } ; print("") }'
#include <linux/fs.h>
// Use kretfunc instead of a kprobe/kretprobe combo
// because it allows us to read args and retval at the same time.
kretfunc:vfs_read /strcontains(comm, str($1))/ {
if (retval >= 0) {
printf("%-16s %10d want=%-6d read=%-6d", comm, pid, args->count, retval);
$dentry = args->file->f_path.dentry;
// Print maximum 16 path elements. Otherwise the eBPFs verified refuses
// to load this program. An alternative is unroll(16), but that's harder
// to manage.
//
// The sentinel value for "no more parents" is not a NULL pointer, but
// d->parent == d.
$i = 0;
while ($dentry->d_parent != $dentry && $i <= 16) {
printf(" %s", str($dentry->d_name.name));
$dentry = $dentry->d_parent;
$i++;
}
print("");
}
}
Output:
$ bpftrace ./vfsreadsnoop.bt cat
Attaching 1 probe...
cat 300869 want=832 read=832 libc.so.6 x86_64-linux-gnu lib usr
cat 300869 want=784 read=784 libc.so.6 x86_64-linux-gnu lib usr
cat 300869 want=784 read=784 libc.so.6 x86_64-linux-gnu lib usr
cat 300870 want=832 read=832 libc.so.6 x86_64-linux-gnu lib usr
cat 300870 want=784 read=784 libc.so.6 x86_64-linux-gnu lib usr
cat 300870 want=784 read=784 libc.so.6 x86_64-linux-gnu lib usr
cat 300870 want=131072 read=1191 vfsreadsnoop.bt bpftrace dotfiles aktau home
cat 300870 want=131072 read=0 vfsreadsnoop.bt bpftrace dotfiles aktau home
cat 300910 want=832 read=832 libc.so.6 x86_64-linux-gnu lib usr
cat 300910 want=784 read=784 libc.so.6 x86_64-linux-gnu lib usr
cat 300910 want=784 read=784 libc.so.6 x86_64-linux-gnu lib usr
cat 300910 want=131072 read=1649 envsnoop.bt bpftrace dotfiles aktau home
cat 300910 want=131072 read=0 envsnoop.bt bpftrace dotfiles aktau home
Would like a macro or function for mapping a file descriptor to a pathname. I was trying something like (this may be wrong):
and got: