snabbco / snabb

Snabb: Simple and fast packet networking
Apache License 2.0
2.97k stars 300 forks source link

snabb with vhost-user crashes when netmap buffers are published by the guest #1119

Open vmaffione opened 7 years ago

vmaffione commented 7 years ago

Hi, I have two QEMU VMs running a simple vm2vm AppEngine which implements a trivial point-to-point link:

-- Use of this source code is governed by the Apache 2.0 license; see COPYING.

module(..., package.seeall)

local vhostuser = require("apps.vhost.vhost_user")

function run (parameters)
   if not (#parameters == 2) then
     print("Usage: vm2vm <unix-socket-1> <unix-socket-2>")
      main.exit(1)
   end
   local usock1 = parameters[1]
   local usock2 = parameters[2]

   local c = config.new()
   config.app(c, "vh1", vhostuser.VhostUser, {socket_path=usock1,is_server=false})
   config.app(c, "vh2", vhostuser.VhostUser, {socket_path=usock2,is_server=false})

   config.link(c, "vh1.tx -> vh2.rx")
   config.link(c, "vh2.tx -> vh1.rx")

   engine.configure(c)
   engine.main({report = {showlinks=true, showapps=true}})
end

Everything works correctly if I use the regular network stack in the guest (e.g. ping, netperf, etc.). However, if I try to use netmap on the virtio-net interface inside the guest

$ sudo pkt-gen -i ens4 -f tx -R1  # inside one of the VMs, ens4 is the virtio-net interface

snabb immediately crashes:

$ sudo src/snabb vm2vm /var/run/vm10-10.socket /var/run/vm11-11.socket 
Get features 0x18428001
 VIRTIO_F_ANY_LAYOUT VIRTIO_NET_F_MQ VIRTIO_NET_F_CTRL_VQ VIRTIO_NET_F_MRG_RXBUF VIRTIO_RING_F_INDIRECT_DESC VIRTIO_NET_F_CSUM
Get features 0x18428001
 VIRTIO_F_ANY_LAYOUT VIRTIO_NET_F_MQ VIRTIO_NET_F_CTRL_VQ VIRTIO_NET_F_MRG_RXBUF VIRTIO_RING_F_INDIRECT_DESC VIRTIO_NET_F_CSUM
vhost_user: Skipped old feature cache in /tmp/vhost_features___var__run__vm10-10.socket
vhost_user: Caching features (0x18028001) in /tmp/vhost_features___var__run__vm10-10.socket
Set features 0x18028001
 VIRTIO_F_ANY_LAYOUT VIRTIO_NET_F_CTRL_VQ VIRTIO_NET_F_MRG_RXBUF VIRTIO_RING_F_INDIRECT_DESC VIRTIO_NET_F_CSUM
rxavail = 0 rxused = 0
rxavail = 0 rxused = 0
vhost_user: Skipped old feature cache in /tmp/vhost_features___var__run__vm11-11.socket
vhost_user: Caching features (0x18028001) in /tmp/vhost_features___var__run__vm11-11.socket
Set features 0x18028001
 VIRTIO_F_ANY_LAYOUT VIRTIO_NET_F_CTRL_VQ VIRTIO_NET_F_MRG_RXBUF VIRTIO_RING_F_INDIRECT_DESC VIRTIO_NET_F_CSUM
rxavail = 0 rxused = 0
rxavail = 0 rxused = 0

lib/virtio/net_device.lua:366: mapping to host address failedcdata<void *>: 0x2036a8b0
stack traceback:
    core/main.lua:137: in function <core/main.lua:135>
    [C]: in function 'error'
    lib/virtio/net_device.lua:366: in function 'map_from_guest'
    lib/virtio/net_device.lua:263: in function 'packet_start'
    lib/virtio/virtq_device.lua:66: in function 'get_buffers'
    lib/virtio/net_device.lua:208: in function 'transmit_packets_to_vm'
    lib/virtio/net_device.lua:185: in function 'poll_vring_transmit'
    apps/vhost/vhost_user.lua:87: in function 'method'
    core/app.lua:89: in function 'with_restart'
    core/app.lua:384: in function 'thunk'
    core/histogram.lua:98: in function 'breathe'
    core/app.lua:330: in function 'main'
    program/vm2vm/vm2vm.lua:23: in function 'run'
    core/main.lua:56: in function <core/main.lua:43>
    [C]: in function 'xpcall'
    core/main.lua:178: in main chunk
    [C]: at 0x00453a80
    [C]: in function 'pcall'
    core/startup.lua:3: in main chunk
    [C]: in function 'require'
    [string "require "core.startup""]:1: in main chunk

The only thing that changes w.r.t. the working setup is that buffers published in the virtio ring do not belong to skbuffs, but are provided by the netmap module (running inside the guest).

lukego commented 7 years ago

Just a quick first reaction:

This specific error is from an address translation routine attempting to translate a "guess physical" address (i.e. an address that the VM thinks is a physical address) into a Snabb host address (i.e. the address inside the Snabb process memory map.) This involves taking an address provided by the guest (here it looks like a receive buffer address on a vring) and looking it up using a translation table that was provided by QEMU with a vhost-user protocol message.

The error is saying that the address translation failed. This could be due to a problem with the table or a problem with the address. Previously these issues have often been due to an obscure edge case or negotiated-feature-combination on the vring where the host is expecting one descriptor format and the guest is using another.

More thoughts to follow......

vmaffione commented 7 years ago

Agree, there is a failure in the gpa --> hva translation. In this specific case the gpa is passed on the transmit vring: the trace says so, and my guest application is really accessing the tx ving (see pkt-gen cmdline).

lukego commented 7 years ago

Does 0x2036a8b0 look to you like a valid address for the VM side? Just wondering whether the problem is with the translation or that the address is untranslatable (mangled somehow on the vring.)

Generally the biggest source of bugs in the Virtio-net code for us is the complexity around the many permutations of descriptor format options. The main test coverage we have for this is our DPDK+Configuration test where we run the DPDK l2fwd application inside a VM and benchmark it in a test matrix that covers many DPDK versions and also several Virtio-net options (default vs without mrg_rxbuf vs without indirect descriptors.) However this is not exhaustive since there are more combinations of options and different guests (Linux, DPDK, netmap, etc) can all format descriptors in different ways.

One possible approach to troubleshooting would be to see how dependent the error is on the specific virtio-net options that are negotiated (if at all.) If you suspect the problem is with the mem table translation it could be worth adding some printouts to see what mapping Snabb has received in vhost_user.lua.

vmaffione commented 7 years ago

On guest side I don't even see GPAs, standard virtio functions do the GVA-->GPA translation, see https://github.com/luigirizzo/netmap/blob/master/LINUX/virtio_netmap.h#L420-L427

lukego commented 7 years ago

@vmaffione It would be very convenient to have a reproducible test case based on a NixOS guest. Then I could run it interactively and we could also have the CI include multiple netmap versions in its test matrix. Do you know if anybody has ever packaged netmap for NixOS? (Guessing it would be similar to the dpdk package.)

The simplest way to incorporate such a test case would be if the VM would have one virtio interface and simply retransmit every packet that it receives. Then we could reuse the existing test scaffolding that we use for running dpdk l2fwd in the guest.

vmaffione commented 7 years ago

I don't know NixOS, but netmap is just an out-of-tree kernel module with some example applications, that you build in the usual way

$ ./configure
$ make
$ make install

In any case, the very same setup works when user-space virtio or vhost-net (in kernel accelerator) are used as network backend. So I deduce it must be some translation case that is not managed by snabb. Maybe printing the snabb translation map will reveal the issue.

lukego commented 7 years ago

Is it similarly easy to run a program that loops back all received packets on a virtio-net port?

vmaffione commented 7 years ago

Yes. I just modified the netmap "bridge" example to be able to do that:

# bridge -L -i netmap:ethX

will do the trick (once I publish the update).

lukego commented 7 years ago

I can take a look at cooking up a reproducible test case if you can post for me source location and build instructions for all the relevant components. Just needs to be a stable location e.g. release tarball or a Git branch that won't be deleted or rebased.

vmaffione commented 7 years ago

Code is here https://github.com/luigirizzo/netmap.git , master branch. You can build and install it with

$ ./configure --drivers=virtio_net.c
$ make
$ sudo make install

But you need to make sure the configure system is able to find kernel sources, because the virtio_net driver (a kernel module) needs to be patched. You can use the --kernel-sources ./configure argument if necessary. Also, you need to make sure the modified virtio_net module is loaded rather than the original one.

lukego commented 7 years ago

Thanks, that sounds doable. I will try to make time for this during the week.

vmaffione commented 7 years ago

ok, thanks! I dont understand whether do you want to automate this or not (and why), but maybe it would be just enough to catch the bug :)

lukego commented 7 years ago

My view is that the host side of a Virtio-net interface is extremely complex. We have to deal with every combination of quirks and features desired by the guests. (The guests, in contrast, can just pick the features they like best and use those.) This level of complexity requires fully automated testing to avoid "whack-a-mole" situations where fixing one guest breaks another.

The goal here as I understand it is to support netmap/virtio-net guests on a snabbnfv host. The only valid way to do that from my perspective is add coverage to the CI. Otherwise we only know that Snabb master branch supported netmap master branch on date X.

(This is my view after ~ 5 years of dealing with the complicated moving target that is Virtio-net specification and guests from the device-side perspective.)

lukego commented 7 years ago

relatedly: the CI also provides hundreds of machine-hours of testing for each kind of guest and experience suggests that this is necessary for detecting certain classes of bugs. For example poll-mode drivers run at high frequencies and tend to make obscure race conditions more plausible, for example see this memory barrier error that affected both Snabb and DPDK on the host side but only showed up approximately once per hour of sustained traffic.

vmaffione commented 7 years ago

Sure, it sounds good. Regression tests are a powerful tool.

But I would like to point out that netmap is accessing the vring using the same API used by the Linux regular virtio-net driver, without renegotiating anything. So at least in principle there should not be any difference, that's why the failure is weird. In any case I will double check our virtio-net netmap code to make sure it is not circumventing the negotiated features.

lukego commented 7 years ago

@vmaffione Could you also tell me which guest kernel version(s) you have seen this problem with? One thing our CI currently lacks is to test many different Linux guests so the issue could be kernel version too.

vmaffione commented 7 years ago

Linux 4.10.6.

vmaffione commented 7 years ago

Just checking... does Snabb support the virtio-net header? If yes, does it support the 10 bytes one or the 12 bytes?

lukego commented 7 years ago

We support both and choose layout based on the features negotiated by the guest.

vmaffione commented 7 years ago

Some more information: the offending GPA (0x2036a8b0 in the trace above) corresponds to a global variable which is part of the (patched) virtio_net kernel module: https://github.com/luigirizzo/netmap/blob/master/LINUX/virtio_netmap.h#L357

And yes, the address is pushed on the RX vring.

However, I don't see what should be wrong with a static address vs a kmalloc()ed address..

vmaffione commented 7 years ago

Confirmed, if I use only kmalloc()ed buffers, Snabb does not crash anymore and everything works as expected.

To make it more clear, I was using two shared buffers to hold the virtio-net-header. Those buffers were allocated as global variables (e.g. in the .data section). This actually works for both virtio+vhost-net and virtio+vhost-user+ovs-dpdk. But it doesn't work for Snabb, so there must be a bug somewhere in Snabb.

If I alloc the shared buffers with kmalloc(), the problem disappears.

lukego commented 7 years ago

Good detective work. Could be an issue with multiple memory regions in the map. I have noticed in the past that guests have tended to serve all DMA from the same region and this does not exercise the code very well.