Open endersonmaia opened 8 months ago
hi @endersonmaia, thanks for the report..
did the system recover itself or did it crash completely?
are you running another example altogether? I noticed this line (that is not related to systrack):
Mar 09 11:42:33.392989 debian-dev kernel: luathread: [00000000ae5213a3] attempt to call a string value
how many cores you have in this machine?
can you try this patch and run it for a while to observe if it will crash eventually?
diff --git a/examples/systrack.lua b/examples/systrack.lua
index 13a987d0..f7c537ef 100644
--- a/examples/systrack.lua
+++ b/examples/systrack.lua
@@ -23,28 +23,12 @@
local linux = require("linux")
local probe = require("probe")
-local device = require("device")
local syscall = require("syscall.table")
local track = {}
local function nop() end -- do nothing
-local s = linux.stat
-local driver = {name = "systrack", open = nop, release = nop, mode = s.IRUGO}
-
-local toggle = true
-function driver:read()
- local log = ""
- if toggle then
- for symbol, counter in pairs(track) do
- log = log .. string.format("%s: %d\n", symbol, counter)
- end
- end
- toggle = not toggle
- return log
-end
-
for symbol, address in pairs(syscall) do
local function handler()
track[symbol] = (track[symbol] or 0) + 1
@@ -53,5 +37,3 @@ for symbol, address in pairs(syscall) do
probe.new(address, {pre = handler, post = nop})
end
-device.new(driver)
-
Notice, it will remove the device completely.. so, you won't be able to cat /dev/systrack
, but it's alright.. it will still run.. I'm doing this to enable this script to avoid sleeping on locks.. then, please run it disabling sleep
, in this way..
sudo make examples_install
sudo lunatik run examples/systrack false
then, observe dmesg
for at least the amount of time the bug was happening before.. thanks!
please notice, that it should make the system irresponsive when stopping the runtime.. btw, on your tests did you happen to stop the runtime?
did the system recover itself or did it crash completely?
Creashed.
are you running another example altogether? I noticed this line (that is not related to systrack):
I probably tried some other example, but I can make a clean test from scratch and see what happens.
Mar 09 11:42:33.392989 debian-dev kernel: luathread: [00000000ae5213a3] attempt to call a string value
how many cores you have in this machine?
It's a VirtualBox VM configured with 4 vCPUs
$ nproc
4
can you try this patch and run it for a while to observe if it will crash eventually?
sure, will send the logs on my next message
btw, on your tests did you happen to stop the runtime?
I don't know how to stop the runtime. :)
After applying the patch and running as you said, I see nothing relevant in dmesg
output.
I probably tried some other example, but I can make a clean test from scratch and see what happens.
no worries, it shouldn't be interfering
I don't know how to stop the runtime. :)
$ sudo lunatik stop examples/systrack
, in this case =)
After applying the patch and running as you said, I see nothing relevant in dmesg output.
it should be working properly then.. can you also observe the CPU load (e.g., top)?
So, we should be using a non-sleepable runtime for systrack.. for this, we would probably need to redesign systrack to use at least two separate runtimes.. one for the device driver and another for the probes, because the device might be able to sleep.. we could use a rcu.table for syncing the counters.. if want to give it a try, I can help.. but it's low prio for me right now.. the fastest way I see to fix this example is to use a whitelist or blacklist for the syscalls to track.. I will provide a patch for this shortly..
@endersonmaia can you give it a try?
I probably tried some other example, but I can make a clean test from scratch and see what happens.
no worries, it shouldn't be interfering
I don't know how to stop the runtime. :)
$ sudo lunatik stop examples/systrack
, in this case =)After applying the patch and running as you said, I see nothing relevant in dmesg output.
it should be working properly then.. can you also observe the CPU load (e.g., top)?
So, we should be using a non-sleepable runtime for systrack.. for this, we would probably need to redesign systrack to use at least two separate runtimes.. one for the device driver and another for the probes, because the device might be able to sleep.. we could use a rcu.table for syncing the counters.. if want to give it a try, I can help.. but it's low prio for me right now.. the fastest way I see to fix this example is to use a whitelist or blacklist for the syscalls to track.. I will provide a patch for this shortly..
Hi, can you please create a formal issue and assign it to me ? I would like to work on it.
hi @glk0,
After applying the patch and running as you said, I see nothing relevant in dmesg output.
it should be working properly then.. can you also observe the CPU load (e.g., top)? So, we should be using a non-sleepable runtime for systrack.. for this, we would probably need to redesign systrack to use at least two separate runtimes.. one for the device driver and another for the probes, because the device might be able to sleep.. we could use a rcu.table for syncing the counters.. if want to give it a try, I can help.. but it's low prio for me right now.. the fastest way I see to fix this example is to use a whitelist or blacklist for the syscalls to track.. I will provide a patch for this shortly..
Hi, can you please create a formal issue and assign it to me ? I would like to work on it.
sure; however, we would still need to groom such issue.. I've put more thoughts on it and I think we could actually have non-sleepable device drivers.. perhaps it should be enough to prevent syscalls to starve during probing.. however, we have a termination problem while stopping the systrack runtime that should be fixed as well.. so it will require more investigation.. would you like to join me on Matrix to discuss this further and come out with a proper issue?
@glk0 another thing to consider is to add support for probes per CPU.. it could not only fix this but also improve probing overhead..
@endersonmaia can you give it a try?
It worked and didn't break. :)
@endersonmaia can you give it a try? https://github.com/luainkernel/lunatik/tree/lneto_systrack
It worked and didn't break. :)
please review the PR then =)
When trying to run the
systrack
example, I cancat /dev/systrack
a couple of times but then the system breaks.My environment:
Full logs attached: systrack-dmesg.log
log snippet: