debugfs needs to be mounted before starting ply

it-klinger commented 3 years ago

When starting ply without a debugfs mounted at /sys/kernel/debug there's an error:

$ ply <file> info: creating kallsyms cache ERR:-2

My proposal is to check for a mounted debugfs and automatically mount it if not present. I can also implement this.

it-klinger commented 3 years ago

The example script from the documentation is different on ARM; the syscalls are named "sys.*" with small letters and not "SyS.*" . When i change it to small letters it's also not working:

$ cat cnt kprobe:sys_* { @syscalls[caller] = count(); }

$ ply cnt ERR:-22

During debugging i saw that the symbols are taken vom /proc/kallsyms but there are symbols in the file which are not accepted by /sys/kernel/debug/tracing/kprobe_events, e. g.

sys_call_table

Maybe it would be a possible solution to take the symbols from /sys/kernel/debug/tracing/available_filter_functions but I'm not sure if this is a sustainable solution.

wkz commented 3 years ago

Thanks, for submitting. Please try to keep each issue to a single topic so tracking is easier.

Regarding debugfs: I recently implemented a self-test mode in ply (ply -T) which uses some heuristics to verify a user's setup. It also checks that debugfs is mounted. While I agree that most users will want to mount debugfs at that point, I do not think it is the job of ply to mount filesystems. NOTE: This is still not pushed, I will try to do that tonight.

The syscall tracing examples are what is causing the most issues to be opened on ply by a mile. There are dragons everywhere, and yet it is the first thing (understandably) that everyone tries. Long-term, I want to implement a proper syscall: provider so that you do not have to keep track of arch-specific stuff to do this.

Regarding wildcards: I was not aware of available_filter_functions, thank you! That does indeed look like the way forward. kallsyms is still needed to get the address information to implement offsets properly, but wildcard matches should be filtered through this list.

wkz commented 3 years ago

The self-test is no on master: https://github.com/wkz/ply/commit/e25c9134b856cc7ffe9f562ff95caf9487d16b59

it-klinger commented 3 years ago

It's not working with me. See output below.

There is the wrong linux version compiled into ply. How can i specify the linux source tree?

# ply -T Verifying kernel config (/proc/config.gz)... OK Ensuring that debugfs is mounted... OK Verifying kprobe... OK Verifying tracepoint... /bin/sh: line 79: 324 Aborted $PLYBIN 'tracepoint:sched/sched_switch { exit(0); }' 2> /dev/null ERROR

# ply 'tracepoint:sched/sched_switch { exit(0); }' ply: provider/tracepoint.c:197: tracepoint_parse: Assertion offs == type_offsetof(t, t->sou.fields[n - 1].name)' failed. Aborted

# ply -v ply 2.1.1-14-ge25c913 (linux-version:267168~4.19.160)

# uname -a Linux bw 5.10.1-rt20-wega-bw #2 PREEMPT_RT Sun Jan 3 19:39:05 CET 2021 armv7l GNU/Linux

wkz commented 3 years ago

Well, the kernel is less picky about the versions matching these days, so that should not be a problem. That said, you should be able to set CPPFLAGS in the normal way when running configure if you want.

Not really sure what is happening here. ARM seems to work fine in the CI job: https://github.com/wkz/ply/runs/1661513654

Could you paste the contents of /sys/kernel/debug/tracing/events/sched/sched_switch/format on your system?

it-klinger commented 3 years ago

# cat /sys/kernel/debug/tracing/events/sched/sched_switch/format name: sched_switch ID: 233 format:

field:unsigned short common_type;   offset:0;   size:2; signed:0;
field:unsigned char common_flags;   offset:2;   size:1; signed:0;
field:unsigned char common_preempt_count;   offset:3;   size:1; signed:0;
field:int common_pid;   offset:4;   size:4; signed:1;
field:unsigned char common_migrate_disable; offset:8;   size:1; signed:0;
field:unsigned char common_preempt_lazy_count;  offset:9;   size:1; signed:0;

field:char prev_comm[16];   offset:12;  size:16;    signed:0;
field:pid_t prev_pid;   offset:28;  size:4; signed:1;
field:int prev_prio;    offset:32;  size:4; signed:1;
field:long prev_state;  offset:36;  size:4; signed:1;
field:char next_comm[16];   offset:40;  size:16;    signed:0;
field:pid_t next_pid;   offset:56;  size:4; signed:1;
field:int next_prio;    offset:60;  size:4; signed:1;

print fmt: "prev_comm=%s prev_pid=%d prev_prio=%d prev_state=%s%s ==> next_comm=%s next_pid=%d next_prio=%d", REC->prev_comm, REC->prev_pid, REC->prev_prio, (REC->prev_state & ((((0x0000 | 0x0001 | 0x0002 | 0x0004 | 0x0008 | 0x0010 | 0x0020 | 0x0040) + 1) << 1) - 1)) ? __print_flags(REC->prev_state & ((((0x0000 | 0x0001 | 0x0002 | 0x0004 | 0x0008 | 0x0010 | 0x0020 | 0x0040) + 1) << 1) - 1), "|", { 0x0001, "S" }, { 0x0002, "D" }, { 0x0004, "T" }, { 0x0008, "t" }, { 0x0010, "X" }, { 0x0020, "Z" }, { 0x0040, "P" }, { 0x0080, "I" }) : "R", REC->prev_state & (((0x0000 | 0x0001 | 0x0002 | 0x0004 | 0x0008 | 0x0010 | 0x0020 | 0x0040) + 1) << 1) ? "+" : "", REC->next_comm, REC->next_pid, REC->next_prio

it-klinger commented 3 years ago

Now, I fixed the wrong kernel header version so that's equal to the version of the running kernel. But this was not the real problem.

I'm using the rt-preemption-patch and the sched_switch event is different with versus without rt-patch. See below the output when booting linux without rt-patch in comparison to the last comment with rt-patch.

So the consequence is to build against the same, patched kernel headers when using kernel patches.

# ply -v ply (linux-version:330241~5.10.1)

# uname -a Linux bw 5.10.1-wega-bw #1 Fri Jan 8 20:17:29 CET 2021 armv7l GNU/Linux

# ply -T Verifying kernel config (/proc/config.gz)... OK Ensuring that debugfs is mounted... OK Verifying kprobe... OK Verifying tracepoint... OK

# cat /sys/kernel/debug/tracing/events/sched/sched_switch/format name: sched_switch ID: 233 format:

    field:unsigned short common_type;       offset:0;       size:2; signed:0;
    field:unsigned char common_flags;       offset:2;       size:1; signed:0;
    field:unsigned char common_preempt_count;       offset:3;       size:1; signed:0;
    field:int common_pid;   offset:4;       size:4; signed:1;

    field:char prev_comm[16];       offset:8;       size:16;        signed:0;
    field:pid_t prev_pid;   offset:24;      size:4; signed:1;
    field:int prev_prio;    offset:28;      size:4; signed:1;
    field:long prev_state;  offset:32;      size:4; signed:1;
    field:char next_comm[16];       offset:36;      size:16;        signed:0;
    field:pid_t next_pid;   offset:52;      size:4; signed:1;
    field:int next_prio;    offset:56;      size:4; signed:1;

print fmt: "prev_comm=%s prev_pid=%d prev_prio=%d prev_state=%s%s ==> next_comm=%s next_pid=%d next_prio=%d", REC->prev_comm, REC->prev_pid, REC->prev_prio, (REC->prev_state & ((((0x0000 | 0x0001 | 0x0002 | 0x0004 | 0x0008 | 0x0010 | 0x0020 | 0x0040) + 1) << 1) - 1)) ? __print_flags(REC->prev_state & ((((0x0000 | 0x0001 | 0x0002 | 0x0004 | 0x0008 | 0x0010 | 0x0020 | 0x0040) + 1) << 1) - 1), "|", { 0x0001, "S" }, { 0x0002, "D" }, { 0x0004, "T" }, { 0x0008, "t" }, { 0x0010, "X" }, { 0x0020, "Z" }, { 0x0040, "P" }, { 0x0080, "I" }) : "R", REC->prev_state & (((0x0000 | 0x0001 | 0x0002 | 0x0004 | 0x0008 | 0x0010 | 0x0020 | 0x0040) + 1) << 1) ? "+" : "", REC->next_comm, REC->next_pid, REC->next_prio

wkz commented 3 years ago

Interesting. ply aligns fields as though all members are laid out sequentially. But it seems like the kernel treats the common fields as a separate struct (and therefore aligns prev_comm on a 4 byte boundary).

This shines a light on a major deficiency in plys type system. Unfortunately this is not a quick fix. Once I get around to that refactor, I will make sure to fix this as well.

wkz commented 2 years ago

Fixed in 6e25f69

wkz / ply

debugfs needs to be mounted before starting ply #27