Open takikawa opened 6 years ago
AFAIK, at the moment it's not possible to disable vmprofiling in Raptorjit, as it's always on. By that I mean there's not flag or environment variable to disable profiling. So in case of going for 3) it would be necessary as well to implement this mechanism.
As for 2) initially it sounds good to me, I have one question though. As the issue is with system calls fork
, clone
, etc won't ljsyscall functions will translate to these system calls too?
The PC-losering problem just won't die eh :).
I have a vague feeling that Linux signals can be setup to not interrupt system calls i.e. to defer the signal until after the system call completes. This could be a nice solution since we are not that interested in profiling system calls? I can't find how to do that right now so it is possible that I dreamed it.
Some info. on blocking system calls here:
https://stackoverflow.com/questions/2853653/deferring-signal-handling-in-linux
On Wed, May 2, 2018, 08:14 Luke Gorrie, notifications@github.com wrote:
The PC-losering https://www.dreamsongs.com/WIB.html problem just won't die eh :).
I have a vague feeling that Linux signals can be setup to not interrupt system calls i.e. to defer the signal until after the system call completes. This could be a nice solution since we are not that interested in profiling system calls? I can't find how to do that right now so it is possible that I dreamed it.
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/snabbco/snabb/issues/1337#issuecomment-385957441, or mute the thread https://github.com/notifications/unsubscribe-auth/AE7ethP_ObXezPOkodRn9Mb5ZmLMaCeJks5tuaMbgaJpZM4TusHx .
Would it solve the problem to switch our setitimer(2)
call from ITIMER_PROF
to ITIMER_VIRTUAL
?
This should only profile the time spent running in userspace and a reasonable person might expect (...) that this would guarantee that system calls are never interrupted.
This would make vmprofile less useful for profiling code that spends time in kernel space but that is not especially relevant to Snabb (and reasonably well covered by perf
externally too.)
@lukego Thanks, that sounds like a good idea! I made a PR for that at #1340. It seems to get rid of the selftest hang nicely. Not sure if it's made much difference in vmprofile results, but some basic sanity checks comparing the results with/without ITIMER_PROF
seemed ok anyway.
Recently some CI runs on the raptorjit branch seem to be failing because the nfvconfig selftest hangs. The most recent runs don't show this because they are erroring with a merge issue, but AFAIK the issue is still there.
Here's an example SnabbBot log from #1332 with the issue: https://gist.github.com/SnabbBot/1a57acb59416bd84bafe173985fb6895
The test hangs with some output like this:
I minimized the test a bit to make the issue easier to reproduce:
The test hangs at the
popen
call. In the actual test, thepopen
happens due to a use offirstfile
while getting some PCI device information forvirtual_ether_mux.configure
.One thing I noticed is that the reason this only hangs on the raptorjit branch is that it only happens when vmprofile is on.
If you do an strace, you can see the reason for this:
From this strace output, you can see that after a
popen
does aclone
to make a child process the syscall is constantly interrupted by the 1ms SIGPROF timer for vmprofile. This causes the syscall to restart, but it gets interrupted again. And so it gets stuck in an infinite loop.It looks like this has been an issue in other software like Chromium (https://bugs.chromium.org/p/chromium/issues/detail?id=83521) or with JVM profiling (https://github.com/jvm-profiling-tools/async-profiler/issues/97).
There seem to be a few potential ways to solve this:
firstfile
, it could use ljsyscall to do the filesystem operations.The first one sounds unappealing since it would reduce the profiling granularity, but maybe it's acceptable. The second seems pretty do-able for this particular case.
Any thoughts on a preferred solution?