snabbco / snabb

Snabb: Simple and fast packet networking
Apache License 2.0
2.96k stars 298 forks source link

nfvconfig selftest hanging on raptorjit branch #1337

Open takikawa opened 6 years ago

takikawa commented 6 years ago

Recently some CI runs on the raptorjit branch seem to be failing because the nfvconfig selftest hangs. The most recent runs don't show this because they are erroring with a merge issue, but AFAIK the issue is still there.

Here's an example SnabbBot log from #1332 with the issue: https://gist.github.com/SnabbBot/1a57acb59416bd84bafe173985fb6895

The test hangs with some output like this:

engine: new_link id11_Virtio.tx -> id11_NIC.input
engine: link_output id11_Virtio
engine: link_input id11_NIC
load: time: 1.85s  fps: 0         fpGbps: 0.000 fpb: 0   bpp: -    sleep: 100 us
testing: program/snabbnfv/test_fixtures/nfvconfig/scale_change/x

I minimized the test a bit to make the issue easier to reproduce:

module(...,package.seeall)

local virtual_ether_mux = require("lib.io.virtual_ether_mux")
local lib = require("core.lib")

function load (file, pciaddr)
   local ports = lib.load_conf(file)
   local c = config.new()
   virtual_ether_mux.configure(c, ports, {pci = "0000:82:00.0"})
   return c
end

function selftest ()
   print("start test")
   local path = "program/snabbnfv/test_fixtures/nfvconfig/scale_up/x"
   for i=1,10 do
      print(i)

      print("start popen")
      -- will get stuck here, filepath doesn't matter
      local f = io.popen("cat /proc/cpuinfo")
      f:read("*l")
      f:close()
      print("end popen")

      engine.configure(load(path))
      engine.main({duration = 0.1})
   end
   print("done")
end

The test hangs at the popen call. In the actual test, the popen happens due to a use of firstfile while getting some PCI device information for virtual_ether_mux.configure.

One thing I noticed is that the reason this only hangs on the raptorjit branch is that it only happens when vmprofile is on.

If you do an strace, you can see the reason for this:

write(1, "start popen\n", 12)           = 12
write(67, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 607) = 607
pipe2([68, 69], O_CLOEXEC)              = 0
clone(child_stack=NULL, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x7fe3135289d0) = ? ERESTARTNOINTR (To be restarted)
--- SIGPROF {si_signo=SIGPROF, si_code=SI_KERNEL} ---
rt_sigreturn({mask=[]})                 = 56
clone(child_stack=NULL, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x7fe3135289d0) = ? ERESTARTNOINTR (To be restarted)
--- SIGPROF {si_signo=SIGPROF, si_code=SI_KERNEL} ---
rt_sigreturn({mask=[]})                 = 56
clone(child_stack=NULL, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x7fe3135289d0) = ? ERESTARTNOINTR (To be restarted)
--- SIGPROF {si_signo=SIGPROF, si_code=SI_KERNEL} ---

From this strace output, you can see that after a popen does a clone to make a child process the syscall is constantly interrupted by the 1ms SIGPROF timer for vmprofile. This causes the syscall to restart, but it gets interrupted again. And so it gets stuck in an infinite loop.

It looks like this has been an issue in other software like Chromium (https://bugs.chromium.org/p/chromium/issues/detail?id=83521) or with JVM profiling (https://github.com/jvm-profiling-tools/async-profiler/issues/97).

There seem to be a few potential ways to solve this:

The first one sounds unappealing since it would reduce the profiling granularity, but maybe it's acceptable. The second seems pretty do-able for this particular case.

Any thoughts on a preferred solution?

dpino commented 6 years ago

AFAIK, at the moment it's not possible to disable vmprofiling in Raptorjit, as it's always on. By that I mean there's not flag or environment variable to disable profiling. So in case of going for 3) it would be necessary as well to implement this mechanism.

As for 2) initially it sounds good to me, I have one question though. As the issue is with system calls fork, clone, etc won't ljsyscall functions will translate to these system calls too?

lukego commented 6 years ago

The PC-losering problem just won't die eh :).

I have a vague feeling that Linux signals can be setup to not interrupt system calls i.e. to defer the signal until after the system call completes. This could be a nice solution since we are not that interested in profiling system calls? I can't find how to do that right now so it is possible that I dreamed it.

tobyriddell commented 6 years ago

Some info. on blocking system calls here:

https://stackoverflow.com/questions/2853653/deferring-signal-handling-in-linux

On Wed, May 2, 2018, 08:14 Luke Gorrie, notifications@github.com wrote:

The PC-losering https://www.dreamsongs.com/WIB.html problem just won't die eh :).

I have a vague feeling that Linux signals can be setup to not interrupt system calls i.e. to defer the signal until after the system call completes. This could be a nice solution since we are not that interested in profiling system calls? I can't find how to do that right now so it is possible that I dreamed it.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/snabbco/snabb/issues/1337#issuecomment-385957441, or mute the thread https://github.com/notifications/unsubscribe-auth/AE7ethP_ObXezPOkodRn9Mb5ZmLMaCeJks5tuaMbgaJpZM4TusHx .

lukego commented 6 years ago

Would it solve the problem to switch our setitimer(2) call from ITIMER_PROF to ITIMER_VIRTUAL?

This should only profile the time spent running in userspace and a reasonable person might expect (...) that this would guarantee that system calls are never interrupted.

This would make vmprofile less useful for profiling code that spends time in kernel space but that is not especially relevant to Snabb (and reasonably well covered by perf externally too.)

takikawa commented 6 years ago

@lukego Thanks, that sounds like a good idea! I made a PR for that at #1340. It seems to get rid of the selftest hang nicely. Not sure if it's made much difference in vmprofile results, but some basic sanity checks comparing the results with/without ITIMER_PROF seemed ok anyway.