iovisor / bcc

BCC - Tools for BPF-based Linux IO analysis, networking, monitoring, and more
Apache License 2.0
20.58k stars 3.88k forks source link

Can't read in struct fields even with `bpf_probe_read`: -EFAULT #622

Open kalaracey opened 8 years ago

kalaracey commented 8 years ago

I'm trying to trace parts of the scheduler, and am running into an issue related to #188. The solution suggested there was to have multiple iterations of bpf_probe_read, but that function is returning -EFAULT on the third iteration below. Maybe I am missing something obvious? Any help would be greatly appreciated.

text = """
// ...

// For tracing sleep (including io, which is also tracked separately) and blocked time.
// We can't directly trace enqueue_sleeper, so we trace enqueue_entity
int trace_enqueue_entity(struct pt_regs *ctx, struct cfs_rq *cfs_rq, struct sched_entity *se, int flags)
{
    u32 pid = bpf_get_current_pid_tgid();

    // enqueue_sleeper only fired if we have ENQUEUE_WAKEUP
    if (!(flags & ENQUEUE_WAKEUP))
        return 0;

    if (se->statistics.sleep_start) {
        u64 res = 0;
        struct cfs_rq *cfs_rqp = 0;
        struct rq *rqp = 0;
        u64 rq_clock = 0;
        if ((res = bpf_probe_read(&cfs_rqp, sizeof(cfs_rqp), &cfs_rq))) {
            bpf_trace_printk("bpf_probe_read 1 failed: %d\\n", res);
            return 0;
        }

        if ((res = bpf_probe_read(&rqp, sizeof(rqp), &cfs_rqp->rq))) {
            bpf_trace_printk("bpf_probe_read 2 failed: %d\\n", res);
            return 0;
        }

        if ((res = bpf_probe_read(&rq_clock, sizeof(rq_clock), &rqp->clock))) {
            bpf_trace_printk("bpf_probe_read 3 failed: %d\\n", res);
            return 0;
        }
        // ...

    }
    // ...
}
"""

b = BPF(text=text)
b.attach_kprobe(event="enqueue_entity", fn_name="trace_enqueue_entity")

try:
    while True:
        (_, _, _, _, _, msg) = b.trace_fields()
        print(msg)

Results:

bpf_probe_read 3 failed: -14
bpf_probe_read 3 failed: -14
bpf_probe_read 3 failed: -14
...

Version information:

$ uname -a
Linux ubuntu 4.4.0-31-generic #50-Ubuntu SMP Wed Jul 13 00:07:12 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux
$ dpkg -s bcc-tools
0.1.8-291.git.5815f41
4ast commented 8 years ago

bpf_probe_read(&cfs_rqp, sizeof(cfs_rqp), &cfs_rq)

This probe_read looks to be a nop. Just cfs_rqp = cfs_rq would have worked. Also try printing 'rqp'. It's probably null, but shouldn't be.

kalaracey commented 8 years ago

Thanks for your reply. Yes, I've tried doing struct cfs_rq *cfs_rqp = cfs_rq and eliminating the first bpf_probe_read.

I've also tried eliminating the first bpf_probe_read and changing the second to

if ((res = bpf_probe_read(&rqp, sizeof(rap), &cfs_rq->rq))) {

i.e., just using the function parameter cfs_rq rather than copying it onto the stack into cfs_rqp.

Both changes result in the LLVM error:

/virtual/main.c:373:54: error: cannot take the address of an rvalue of type 'typeof(struct rq *)' (aka 'struct rq *')
  ...&({ typeof(struct rq *) _val; memset(&_val, 0, sizeof(_val)); bpf_probe_read(&_val, sizeof(_val), (u64)cfs_rqp + offsetof(struct cfs_rq, rq)); _val; })...
     ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

The error for the second change I attempted differed slightly: it was cfs_rq instead of cfs_rqp:

/virtual/main.c:373:54: error: cannot take the address of an rvalue of type 'typeof(struct rq *)' (aka 'struct rq *')
  ...&({ typeof(struct rq *) _val; memset(&_val, 0, sizeof(_val)); bpf_probe_read(&_val, sizeof(_val), (u64)cfs_rq + offsetof(struct cfs_rq, rq)); _val; })...
     ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Regarding your other suggestion, (without performing your other suggested change, as they didn't compile, but gave the above error) I printed out (%llx) the stack pointer and values of cfs_rqp and rqp after they were loaded and got values that were like

sp:         0xffff8802161cd378
cfs_rqp:    0xffff88021fc16d70
rqp:        0x0000000100a6da66

I don't know kernel addresses very well, but I don't think rqp looks right. Any thoughts?

kalaracey commented 8 years ago

Maybe I should also note that I had to copy in at the top of the BPF C program the definitions for struct rq, struct cfs_rq, and a few small other component structs, all of which are not accessible by bcc (as far as I can tell; I couldn't figure out how to include them, as they are defined in kernel/sched/sched.h).

4ast commented 8 years ago

that's a tough one. the 2nd problem is known clang rewriter bug. Have you tried to do volatile struct cfs_rq *cfs_rqp = cfs_rq; and then use cfq_rqp in the next probe_read? rqp doesn't look right indeed. when 'struct rq/cfs_rq' are copy-pasted it's easy to get the layout wrong due to numerous ifdefs inside. Include of sched.h is possible if 'build' and 'source' dirs are known. Most likely something like include "../../kernel/sched/sched.h" should have worked. No other ideas so far.

kalaracey commented 8 years ago

Thanks again for assisting me as much as you have, and for your most recent suggestions. Declaring the struct volatile and replacing the first bpf_probe_read with a simple assignment as you suggested led to the understandable warning

/virtual/main.c:405:54: warning: passing 'struct rq *volatile *' to parameter of type 'void *' discards qualifiers
      [-Wincompatible-pointer-types-discards-qualifiers]
        if ((res = bpf_probe_read(&rqp, sizeof(rqp), &cfs_rqp->rq))) {
                                                     ^~~~~~~~~~~~

and the cryptic (and familiar) error

/virtual/main.c:406:54: error: cannot take the address of an rvalue of type 'struct rq *'
  ...&({ typeof(struct rq *volatile) _val; memset(&_val, 0, sizeof(_val)); bpf_probe_read(&_val, sizeof(_val), (u64)cfs_rqp + offsetof(volatile struct cfs_rq, rq)); _val; })...
     ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

which doesn't make much sense, seeing as this error isn't triggered except if the struct is declared volatile. Is this error to what you were referring when you said "a known clang rewriter bug"? Do you happen to know of a ticket somewhere for the bug?


Regarding the struct definitions, I was very careful to make sure all the proper macros were defined; there was one that was tricky because it was not a CONFIG_, but rather a macro (HAVE_RT_PUSH_IPI) set somewhere in an #ifdef. Thanks for suggesting a workaround for include sched.h though; obviously copy-pasting is undesirable.

Thanks again for you help.

kalaracey commented 8 years ago

I ended up just removing the clang rewriter functionality from BCC and compiled the library from source. Specifically, it was in ProbeVisitor::VisitMemberExpr. in src/cc/frontends/clang/b_frontend_action.cc. Ended up being stumped by something else (rq seemed wrong).

ming1 commented 8 years ago

I find the following patch can be helpful for the issue[1], then we can use the below style to read the kernel variable instead of one explicit bpf_probe_read() in bpf prog:

 rqp = cfs_rqp->rq;

diff --git a/src/cc/frontends/clang/b_frontend_action.cc b/src/cc/frontends/clang/b_frontend_action.cc index cc0a1fc..55f9a08 100644 --- a/src/cc/frontends/clang/b_frontend_action.cc +++ b/src/cc/frontends/clang/b_frontendaction.cc @@ -241,7 +241,7 @@ bool ProbeVisitor::VisitMemberExpr(MemberExpr *E) { string rhs = rewriter.getRewrittenText(SourceRange(rhs_start, E->getLocEnd())); string base_type = base->getType()->getPointeeType().getAsString(); string pre, post;

[1] /virtual/main.c:406:54: error: cannot take the address of an rvalue of type 'struct rq ' ...&({ typeof(struct rq volatile) _val; memset(&_val, 0, sizeof(_val)); bpf_probe_read(&_val, sizeof(_val), (u64)cfs_rqp + offsetof(volatile struct cfs_rq, rq)); _val; })... ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

4ast commented 8 years ago

@ming1 could you please send a pull req with proposed fix and a test case that demonstrates this new rewriter capability? Thanks!

suokun commented 7 years ago

I have the same issue. I can't read the kernel struct variable with bpf_probe_read. Any solutions for that? Thanks.

bobrik commented 6 years ago

I have a minimal reproduction program:

#!/usr/bin/python

from bcc import BPF
from time import sleep

bpf_text = """
#include <uapi/linux/ptrace.h>
#include <linux/sched.h>

static int trace_enqueue(struct task_struct *task) {
    // BUG: cannot read task->se here
    struct sched_entity se;
    bpf_probe_read(&se, sizeof(se), &task->se);

    return 0;
}

int trace_wake_up_new_task(struct pt_regs *ctx, struct task_struct *task) {
    // BUG: cannot read task->se here
    struct sched_entity se;
    bpf_probe_read(&se, sizeof(se), &task->se);

    return trace_enqueue(task);
}

int trace_run(struct pt_regs *ctx, struct task_struct *prev) {
    struct task_struct *p = (void *) PT_REGS_PARM1(ctx);

    // Can read task->se here without any issues
    struct sched_entity se;
    bpf_probe_read(&se, sizeof(p->se), &p->se);

    return 0;
}
"""

b = BPF(text=bpf_text, debug=0x4)

sleep(3600)

The main part is eBPF code:

#include <uapi/linux/ptrace.h>
#include <linux/sched.h>

static int trace_enqueue(struct task_struct *task) {
    // BUG: cannot read task->se here
    struct sched_entity se;
    bpf_probe_read(&se, sizeof(se), &task->se);

    return 0;
}

int trace_wake_up_new_task(struct pt_regs *ctx, struct task_struct *task) {
    // BUG: cannot read task->se here
    struct sched_entity se;
    bpf_probe_read(&se, sizeof(se), &task->se);

    return trace_enqueue(task);
}

int trace_run(struct pt_regs *ctx, struct task_struct *prev) {
    struct task_struct *p = (void *) PT_REGS_PARM1(ctx);

    // Can read task->se here without any issues
    struct sched_entity se;
    bpf_probe_read(&se, sizeof(p->se), &p->se);

    return 0;
}

Which compiles to the following:

#include <uapi/linux/ptrace.h>
#include <linux/sched.h>

__attribute__((always_inline))
static int trace_enqueue(struct task_struct *task) {
    // BUG: cannot read task->se here
    struct sched_entity se;
    bpf_probe_read(&se, sizeof(se), &({ typeof(struct sched_entity) _val; memset(&_val, 0, sizeof(_val)); bpf_probe_read(&_val, sizeof(_val), (u64)&task->se); _val; }));

    return 0;
}

__attribute__((section(".bpf.fn.trace_wake_up_new_task")))
int trace_wake_up_new_task(struct pt_regs *ctx) { struct task_struct *task = ctx->di;
    // BUG: cannot read task->se here
    struct sched_entity se;
    bpf_probe_read(&se, sizeof(se), &({ typeof(struct sched_entity) _val; memset(&_val, 0, sizeof(_val)); bpf_probe_read(&_val, sizeof(_val), (u64)&task->se); _val; }));

    return trace_enqueue(task);
}

__attribute__((section(".bpf.fn.trace_run")))
int trace_run(struct pt_regs *ctx) { struct task_struct *prev = ctx->di;
    struct task_struct *p = (void *) PT_REGS_PARM1(ctx);

    // Can read task->se here without any issues
    struct sched_entity se;
    bpf_probe_read(&se, sizeof(p->se), &p->se);

    return 0;
}

The errors are:

/virtual/main.c:9:37: error: cannot take the address of an rvalue of type 'typeof(struct sched_entity)' (aka 'struct sched_entity')
    bpf_probe_read(&se, sizeof(se), &({ typeof(struct sched_entity) _val; memset(&_val, 0, sizeof(_val)); bpf_probe_read(&_val, sizeof(_val), (u64)&task->se); _val; }));
                                    ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/virtual/main.c:18:37: error: cannot take the address of an rvalue of type 'typeof(struct sched_entity)' (aka 'struct sched_entity')
    bpf_probe_read(&se, sizeof(se), &({ typeof(struct sched_entity) _val; memset(&_val, 0, sizeof(_val)); bpf_probe_read(&_val, sizeof(_val), (u64)&task->se); _val; }));
                                    ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
2 errors generated.

For some reason it only happens in the first two functions that get task_struct * in arguments, but not in the third one, where it's pulled from ctx. In the latter case there is no extra rewrite step.

pchaigno commented 6 years ago

@bobrik The following works fine with the latest version of bcc:

#!/usr/bin/python
from bcc import BPF
from time import sleep

bpf_text = """
#include <uapi/linux/ptrace.h>
#include <linux/sched.h>

static int trace_enqueue(struct task_struct *task) {
    struct sched_entity se = task->se;
    return 0;
}

int trace_wake_up_new_task(struct pt_regs *ctx, struct task_struct *task) {
    struct sched_entity se = task->se;
    return trace_enqueue(task);
}

int trace_run(struct pt_regs *ctx, struct task_struct *prev) {
    struct sched_entity se = prev->se;
    return 0;
}
"""

b = BPF(text=bpf_text, debug=0x4)

sleep(3600)

The bcc rewriter tries to track pointers to external addresses and replaces their dereferences with calls to bpf_probe_read. It's able to trace the task pointer in the first two cases, but fails to trace the p pointer through PT_REGS_PARM1 in the last case.

The issue here is that the bpf_probe_read(&se, sizeof(se), &task->se); syntax is actually relying on the fact that the rewriter will fail to replace the dereference. When it succeeds, the dereference is replaced and Clang throws an error and fails. If you want to use bpf_probe_read and not rely on the rewriter, you can use offsetof to get the actual address of the structure member.

bobrik commented 6 years ago

Got it, thanks!