sysprog21 / lkmpg

The Linux Kernel Module Programming Guide (updated for 5.0+ kernels)
https://sysprog21.github.io/lkmpg/
Open Software License 3.0
7.68k stars 525 forks source link

syscall-steal: General protection fault #270

Open linD026 opened 1 month ago

linD026 commented 1 month ago

Currently, on GitHub action, the status-checks report the error:

...
Running startstop
Running static_key
Running syscall-steal
.ci/build-n-run.sh: line 17:  8630 Segmentation fault      (core dumped) sudo insmod "examples/$1.ko"
Error: Process completed with exit code 1.

GitHub action uses v6.8 and Ubuntu 24.04. However, there are no issues when running the code on my laptop, which is also v6.8 with 6.8.0-45-generic Ubuntu 24.04 LTS. After investigating the kernel messages, it probably came from various platforms, the write operation to the syscall table, or __write_cr0(). And, here is the log:

[  459.424892] general protection fault, maybe for address 0x80040033: 0000 [#1] SMP NOPTI
[  459.427086] CPU: 0 PID: 8281 Comm: insmod Tainted: G           OE      6.8.0-1014-azure #16~22.04.1-Ubuntu
[  459.429787] Hardware name: Microsoft Corporation Virtual Machine/Virtual Machine, BIOS 090008  12/07/2018
[  459.432366] RIP: 0010:syscall_steal_start+0x4d/0xff0 [syscall_steal]
[  459.434064] Code: 05 10 f5 fc ff 48 85 c0 0f 84 8e 00 00 00 48 c7 c7 ae e0 6b c0 e8 b3 27 29 c7 0f 20 c0 48 89 45 f0 f0 80 65 f2 fe 48 8b 45 f0 <0f> 22 c0 48 8b 05 e1 f4 fc ff 48 c7 c7 bc e0 6b c0 48 8b 80 08 08
[  459.439409] RSP: 0018:ffffad1ccd613a68 EFLAGS: 00010202
[  459.440923] RAX: 0000000080040033 RBX: 0000000000000000 RCX: 0000000000000000
[  459.442797] RDX: 0000000000000000 RSI: ffff8adeefc20a00 RDI: ffff8adeefc20a00
[  459.444704] RBP: ffffad1ccd613a78 R08: 0000000000000003 R09: 0000000000000000
[  459.446669] R10: 0000000000000000 R11: ffff8adb43533400 R12: ffffffffc06ed010
[  459.448640] R13: ffff8adb5202ac60 R14: 0000000000000000 R15: 0000000000000000
[  459.450596] FS:  00007fd9167cbc40(0000) GS:ffff8adeefc00000(0000) knlGS:0000000000000000
[  459.452746] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  459.454297] CR2: 0000564a49db1428 CR3: 000000019700c001 CR4: 0000000000b70ef0
[  459.456679] Call Trace:
[  459.457381]  <TASK>
[  459.457987]  ? show_regs+0x6a/0x80
[  459.458973]  ? die_addr+0x38/0xa0
[  459.459902]  ? exc_general_protection+0x1ed/0x480
[  459.461186]  ? asm_exc_general_protection+0x27/0x30
[  459.462538]  ? __pfx_syscall_steal_start+0x10/0x10 [syscall_steal]
[  459.464198]  ? syscall_steal_start+0x4d/0xff0 [syscall_steal]
[  459.465783]  ? syscall_steal_start+0x3d/0xff0 [syscall_steal]
[  459.467337]  do_one_initcall+0x49/0x2d0

To get this log, we can simply add sudo dmesg before exiting in .ci/build-n-run.sh.

Here are some related patch and commit:

jserv commented 1 month ago

How about moving syscall-steal as one of non-working LKM for CI pipeline?

linD026 commented 1 month ago

Sounds good.

linD026 commented 1 month ago

I didn't investigate deeply, but now it seems to work well again. However, I think moving syscall-steal to non-working would be ok since after v6.9 (specifically after commit) syscall-steal won't work due to the original method, an indirect call, has been replaced by the switch statement.