luainkernel / lunatik

Lunatik is a framework for scripting the Linux kernel with Lua.
GNU General Public License v2.0
511 stars 30 forks source link

systrack example breaks #96

Open endersonmaia opened 8 months ago

endersonmaia commented 8 months ago

When trying to run the systrack example, I can cat /dev/systrack a couple of times but then the system breaks.

My environment:

Debian GNU/Linux 12 (bookworm)
Linux debian-dev 6.1.0-18-amd64 #1 SMP PREEMPT_DYNAMIC Debian 6.1.76-1 (2024-02-01) x86_64 GNU/Linux
lunatik 3.4
gcc (Debian 12.2.0-14) 12.2.0

Full logs attached: systrack-dmesg.log

log snippet:

Mar 09 11:41:07.105153 debian-dev kernel: lunatik: loading out-of-tree module taints kernel.
Mar 09 11:41:07.105320 debian-dev kernel: lunatik: module verification failed: signature and/or required key missing - tainting kernel
Mar 09 11:42:33.392989 debian-dev kernel: luathread: [00000000ae5213a3] attempt to call a string value
Mar 09 11:44:34.933202 debian-dev kernel: luaprobe: couldn't find probe table
Mar 09 11:45:17.905016 debian-dev kernel: BUG: scheduling while atomic: node/1316/0x00000002
Mar 09 11:45:17.905465 debian-dev kernel: Modules linked in: lunatik_run(OE) luasyscall(OE) luaprobe(OE) luadata(OE) luafib(OE) luathread(OE) luarcu(OE) luasocket(OE) luanotifier(OE) lualinux(OE) luadevice(OE) lunatik(OE) xt_conntrack nft_chain_nat xt_MASQUERADE nf_nat nf_conntrack_netlink nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 xfrm_user xfrm_algo xt_addrtype nft_compat nf_tables libcrc32c nfnetlink br_netfilter bridge stp llc overlay binfmt_misc intel_rapl_msr intel_rapl_common intel_pmc_core ghash_clmulni_intel sha512_ssse3 sha512_generic sha256_ssse3 sha1_ssse3 aesni_intel snd_intel8x0 vmwgfx crypto_simd snd_ac97_codec cryptd rapl drm_ttm_helper ac97_bus ttm snd_pcm snd_timer snd drm_kms_helper pcspkr soundcore vboxguest joydev button ac evdev serio_raw sg drm fuse dm_mod loop efi_pstore configfs ip_tables x_tables autofs4 ext4 crc16 mbcache jbd2 crc32c_generic hid_generic usbhid hid sd_mod t10_pi sr_mod crc64_rocksoft crc64 crc_t10dif cdrom crct10dif_generic ohci_pci ata_generic ahci ata_piix libahci
Mar 09 11:45:17.907819 debian-dev kernel:  ohci_hcd ehci_pci ehci_hcd crct10dif_pclmul crct10dif_common libata usbcore psmouse crc32_pclmul usb_common scsi_mod crc32c_intel scsi_common e1000 video i2c_piix4 battery wmi
Mar 09 11:45:17.907966 debian-dev kernel: CPU: 2 PID: 1316 Comm: node Tainted: G           OE      6.1.0-18-amd64 #1  Debian 6.1.76-1
Mar 09 11:45:17.908050 debian-dev kernel: Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006
Mar 09 11:45:17.908134 debian-dev kernel: Call Trace:
Mar 09 11:45:17.908266 debian-dev kernel:  <TASK>
Mar 09 11:45:17.908363 debian-dev kernel:  dump_stack_lvl+0x44/0x5c
Mar 09 11:45:17.908391 debian-dev kernel:  __schedule_bug.cold+0x42/0x4e
Mar 09 11:45:17.908415 debian-dev kernel:  __schedule+0x7ff/0x9e0
Mar 09 11:45:17.908438 debian-dev kernel:  ? __x64_sys_epoll_wait+0x5/0x110
Mar 09 11:45:17.908463 debian-dev kernel:  schedule+0x5a/0xd0
Mar 09 11:45:17.908487 debian-dev kernel:  schedule_preempt_disabled+0x11/0x20
Mar 09 11:45:17.908513 debian-dev kernel:  __mutex_lock.constprop.0+0x6cd/0x700
Mar 09 11:45:17.908541 debian-dev kernel:  ? futex_wake+0x151/0x180
Mar 09 11:45:17.908564 debian-dev kernel:  ? __x64_sys_epoll_wait+0x5/0x110
Mar 09 11:45:17.908585 debian-dev kernel:  ? __x64_sys_epoll_wait+0x1/0x110
Mar 09 11:45:17.908608 debian-dev kernel:  luaprobe_pre_handler+0x27/0x8a0 [luaprobe]
Mar 09 11:45:17.908638 debian-dev kernel:  ? __x64_sys_epoll_wait+0x1/0x110
Mar 09 11:45:17.908664 debian-dev kernel:  kprobe_ftrace_handler+0x10b/0x1b0
Mar 09 11:45:17.908687 debian-dev kernel:  0xffffffffc05d90c8
Mar 09 11:45:17.908710 debian-dev kernel:  ? __ia32_sys_epoll_ctl+0xa0/0xa0
Mar 09 11:45:17.908732 debian-dev kernel:  ? __x64_sys_epoll_wait+0x1/0x110
Mar 09 11:45:17.908754 debian-dev kernel:  __x64_sys_epoll_wait+0x5/0x110
Mar 09 11:45:17.908779 debian-dev kernel:  do_syscall_64+0x5b/0xc0
Mar 09 11:45:17.908805 debian-dev kernel:  ? exit_to_user_mode_prepare+0x40/0x1e0
Mar 09 11:45:17.908845 debian-dev kernel:  entry_SYSCALL_64_after_hwframe+0x64/0xce
Mar 09 11:45:17.908871 debian-dev kernel: RIP: 0033:0x7ffa5b228e26
Mar 09 11:45:17.908893 debian-dev kernel: Code: 10 89 7c 24 0c 89 4c 24 1c e8 96 c9 f7 ff 44 8b 54 24 1c 8b 54 24 18 41 89 c0 48 8b 74 24 10 8b 7c 24 0c b8 e8 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 32 44 89 c7 89 44 24 0c e8 e6 c9 f7 ff 8b 44
Mar 09 11:45:17.908916 debian-dev kernel: RSP: 002b:00007fffee029080 EFLAGS: 00000293 ORIG_RAX: 00000000000000e8
Mar 09 11:45:17.909021 debian-dev kernel: RAX: ffffffffffffffda RBX: 0000000000000000 RCX: 00007ffa5b228e26
Mar 09 11:45:17.909049 debian-dev kernel: RDX: 0000000000000400 RSI: 00007fffee029180 RDI: 000000000000000e
Mar 09 11:45:17.909072 debian-dev kernel: RBP: 00007fffee02c1b0 R08: 0000000000000000 R09: 0000000100000001
Mar 09 11:45:17.909094 debian-dev kernel: R10: 0000000000000000 R11: 0000000000000293 R12: 0000000000000000
Mar 09 11:45:17.909117 debian-dev kernel: R13: 000000000539b600 R14: 000000000539b660 R15: 0000000000000000
Mar 09 11:45:17.909142 debian-dev kernel:  </TASK>
Mar 09 11:45:17.909165 debian-dev kernel: BUG: scheduling while atomic: node/1316/0x00000000
Mar 09 11:45:17.909189 debian-dev kernel: Modules linked in: lunatik_run(OE) luasyscall(OE) luaprobe(OE) luadata(OE) luafib(OE) luathread(OE) luarcu(OE) luasocket(OE) luanotifier(OE) lualinux(OE) luadevice(OE) lunatik(OE) xt_conntrack nft_chain_nat xt_MASQUERADE nf_nat nf_conntrack_netlink nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 xfrm_user xfrm_algo xt_addrtype nft_compat nf_tables libcrc32c nfnetlink br_netfilter bridge stp llc overlay binfmt_misc intel_rapl_msr intel_rapl_common intel_pmc_core ghash_clmulni_intel sha512_ssse3 sha512_generic sha256_ssse3 sha1_ssse3 aesni_intel snd_intel8x0 vmwgfx crypto_simd snd_ac97_codec cryptd rapl drm_ttm_helper ac97_bus ttm snd_pcm snd_timer snd drm_kms_helper pcspkr soundcore vboxguest joydev button ac evdev serio_raw sg drm fuse dm_mod loop efi_pstore configfs ip_tables x_tables autofs4 ext4 crc16 mbcache jbd2 crc32c_generic hid_generic usbhid hid sd_mod t10_pi sr_mod crc64_rocksoft crc64 crc_t10dif cdrom crct10dif_generic ohci_pci ata_generic ahci ata_piix libahci
Mar 09 11:45:17.909908 debian-dev kernel:  ohci_hcd ehci_pci ehci_hcd crct10dif_pclmul crct10dif_common libata usbcore psmouse crc32_pclmul usb_common scsi_mod crc32c_intel scsi_common e1000 video i2c_piix4 battery wmi
Mar 09 11:45:17.909957 debian-dev kernel: CPU: 2 PID: 1316 Comm: node Tainted: G        W  OE      6.1.0-18-amd64 #1  Debian 6.1.76-1
Mar 09 11:45:17.909982 debian-dev kernel: Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006
Mar 09 11:45:17.910002 debian-dev kernel: Call Trace:
Mar 09 11:45:17.910027 debian-dev kernel:  <TASK>
Mar 09 11:45:17.910049 debian-dev kernel:  dump_stack_lvl+0x44/0x5c
Mar 09 11:45:17.910076 debian-dev kernel:  __schedule_bug.cold+0x42/0x4e
Mar 09 11:45:17.910096 debian-dev kernel:  __schedule+0x7ff/0x9e0
Mar 09 11:45:17.910115 debian-dev kernel:  schedule+0x5a/0xd0
Mar 09 11:45:17.910135 debian-dev kernel:  schedule_hrtimeout_range_clock+0xd7/0x120
Mar 09 11:45:17.910154 debian-dev kernel:  ? __hrtimer_init+0xe0/0xe0
Mar 09 11:45:17.910176 debian-dev kernel:  do_epoll_wait+0x698/0x7d0
Mar 09 11:45:17.910199 debian-dev kernel:  ? 0xffffffffc05d90c8
Mar 09 11:45:17.910225 debian-dev kernel:  ? ep_eventpoll_poll+0x10/0x10
Mar 09 11:45:17.910247 debian-dev kernel:  __x64_sys_epoll_wait+0x6f/0x110
Mar 09 11:45:17.910268 debian-dev kernel:  do_syscall_64+0x5b/0xc0
Mar 09 11:45:17.910305 debian-dev kernel:  ? __x64_sys_epoll_wait+0x6f/0x110
Mar 09 11:45:17.910325 debian-dev kernel:  ? exit_to_user_mode_prepare+0x40/0x1e0
Mar 09 11:45:17.910345 debian-dev kernel:  ? syscall_exit_to_user_mode+0x27/0x40
Mar 09 11:45:17.910366 debian-dev kernel:  ? do_syscall_64+0x67/0xc0
Mar 09 11:45:17.910388 debian-dev kernel:  ? exit_to_user_mode_prepare+0x40/0x1e0
Mar 09 11:45:17.910408 debian-dev kernel:  entry_SYSCALL_64_after_hwframe+0x64/0xce
Mar 09 11:45:17.910427 debian-dev kernel: RIP: 0033:0x7ffa5b228e26
Mar 09 11:45:17.910446 debian-dev kernel: Code: 10 89 7c 24 0c 89 4c 24 1c e8 96 c9 f7 ff 44 8b 54 24 1c 8b 54 24 18 41 89 c0 48 8b 74 24 10 8b 7c 24 0c b8 e8 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 32 44 89 c7 89 44 24 0c e8 e6 c9 f7 ff 8b 44
Mar 09 11:45:17.910466 debian-dev kernel: RSP: 002b:00007fffee029080 EFLAGS: 00000293 ORIG_RAX: 00000000000000e8
Mar 09 11:45:17.910485 debian-dev kernel: RAX: ffffffffffffffda RBX: 0000000000000201 RCX: 00007ffa5b228e26
Mar 09 11:45:17.910507 debian-dev kernel: RDX: 0000000000000400 RSI: 00007fffee029180 RDI: 000000000000000e
Mar 09 11:45:17.910527 debian-dev kernel: RBP: 00007fffee02c1b0 R08: 0000000000000000 R09: 0000000000000000
Mar 09 11:45:17.910549 debian-dev kernel: R10: 0000000000000201 R11: 0000000000000293 R12: 0000000000000000
Mar 09 11:45:17.910572 debian-dev kernel: R13: 00007ffa5b61c600 R14: 000000000539b660 R15: 0000000000000000
Mar 09 11:45:17.910596 debian-dev kernel:  </TASK>
lneto commented 8 months ago

hi @endersonmaia, thanks for the report..

did the system recover itself or did it crash completely?

are you running another example altogether? I noticed this line (that is not related to systrack):

Mar 09 11:42:33.392989 debian-dev kernel: luathread: [00000000ae5213a3] attempt to call a string value

how many cores you have in this machine?

can you try this patch and run it for a while to observe if it will crash eventually?

diff --git a/examples/systrack.lua b/examples/systrack.lua
index 13a987d0..f7c537ef 100644
--- a/examples/systrack.lua
+++ b/examples/systrack.lua
@@ -23,28 +23,12 @@

 local linux   = require("linux")
 local probe   = require("probe")
-local device  = require("device")
 local syscall = require("syscall.table")

 local track = {}

 local function nop() end -- do nothing

-local s = linux.stat
-local driver = {name = "systrack", open = nop, release = nop, mode = s.IRUGO}
-
-local toggle = true
-function driver:read()
-   local log = ""
-   if toggle then
-       for symbol, counter in pairs(track) do
-           log = log .. string.format("%s: %d\n", symbol, counter)
-       end
-   end
-   toggle = not toggle
-   return log
-end
-
 for symbol, address in pairs(syscall) do
    local function handler()
        track[symbol] = (track[symbol] or 0) + 1
@@ -53,5 +37,3 @@ for symbol, address in pairs(syscall) do
    probe.new(address, {pre = handler, post = nop})
 end

-device.new(driver)
-

Notice, it will remove the device completely.. so, you won't be able to cat /dev/systrack, but it's alright.. it will still run.. I'm doing this to enable this script to avoid sleeping on locks.. then, please run it disabling sleep, in this way..

sudo make examples_install
sudo lunatik run examples/systrack false

then, observe dmesg for at least the amount of time the bug was happening before.. thanks!

lneto commented 8 months ago

please notice, that it should make the system irresponsive when stopping the runtime.. btw, on your tests did you happen to stop the runtime?

endersonmaia commented 8 months ago

did the system recover itself or did it crash completely?

Creashed.

are you running another example altogether? I noticed this line (that is not related to systrack):

I probably tried some other example, but I can make a clean test from scratch and see what happens.

Mar 09 11:42:33.392989 debian-dev kernel: luathread: [00000000ae5213a3] attempt to call a string value

how many cores you have in this machine?

It's a VirtualBox VM configured with 4 vCPUs

$ nproc
4

can you try this patch and run it for a while to observe if it will crash eventually?

sure, will send the logs on my next message

endersonmaia commented 8 months ago

btw, on your tests did you happen to stop the runtime?

I don't know how to stop the runtime. :)

endersonmaia commented 8 months ago

After applying the patch and running as you said, I see nothing relevant in dmesg output.

lneto commented 8 months ago

I probably tried some other example, but I can make a clean test from scratch and see what happens.

no worries, it shouldn't be interfering

I don't know how to stop the runtime. :)

$ sudo lunatik stop examples/systrack, in this case =)

After applying the patch and running as you said, I see nothing relevant in dmesg output.

it should be working properly then.. can you also observe the CPU load (e.g., top)?

So, we should be using a non-sleepable runtime for systrack.. for this, we would probably need to redesign systrack to use at least two separate runtimes.. one for the device driver and another for the probes, because the device might be able to sleep.. we could use a rcu.table for syncing the counters.. if want to give it a try, I can help.. but it's low prio for me right now.. the fastest way I see to fix this example is to use a whitelist or blacklist for the syscalls to track.. I will provide a patch for this shortly..

lneto commented 8 months ago

@endersonmaia can you give it a try?

https://github.com/luainkernel/lunatik/tree/lneto_systrack

glk0 commented 8 months ago

I probably tried some other example, but I can make a clean test from scratch and see what happens.

no worries, it shouldn't be interfering

I don't know how to stop the runtime. :)

$ sudo lunatik stop examples/systrack, in this case =)

After applying the patch and running as you said, I see nothing relevant in dmesg output.

it should be working properly then.. can you also observe the CPU load (e.g., top)?

So, we should be using a non-sleepable runtime for systrack.. for this, we would probably need to redesign systrack to use at least two separate runtimes.. one for the device driver and another for the probes, because the device might be able to sleep.. we could use a rcu.table for syncing the counters.. if want to give it a try, I can help.. but it's low prio for me right now.. the fastest way I see to fix this example is to use a whitelist or blacklist for the syscalls to track.. I will provide a patch for this shortly..

Hi, can you please create a formal issue and assign it to me ? I would like to work on it.

lneto commented 8 months ago

hi @glk0,

After applying the patch and running as you said, I see nothing relevant in dmesg output.

it should be working properly then.. can you also observe the CPU load (e.g., top)? So, we should be using a non-sleepable runtime for systrack.. for this, we would probably need to redesign systrack to use at least two separate runtimes.. one for the device driver and another for the probes, because the device might be able to sleep.. we could use a rcu.table for syncing the counters.. if want to give it a try, I can help.. but it's low prio for me right now.. the fastest way I see to fix this example is to use a whitelist or blacklist for the syscalls to track.. I will provide a patch for this shortly..

Hi, can you please create a formal issue and assign it to me ? I would like to work on it.

sure; however, we would still need to groom such issue.. I've put more thoughts on it and I think we could actually have non-sleepable device drivers.. perhaps it should be enough to prevent syscalls to starve during probing.. however, we have a termination problem while stopping the systrack runtime that should be fixed as well.. so it will require more investigation.. would you like to join me on Matrix to discuss this further and come out with a proper issue?

lneto commented 8 months ago

@glk0 another thing to consider is to add support for probes per CPU.. it could not only fix this but also improve probing overhead..

endersonmaia commented 8 months ago

@endersonmaia can you give it a try?

https://github.com/luainkernel/lunatik/tree/lneto_systrack

It worked and didn't break. :)

lneto commented 8 months ago

@endersonmaia can you give it a try? https://github.com/luainkernel/lunatik/tree/lneto_systrack

It worked and didn't break. :)

please review the PR then =)