Open vmaffione opened 7 years ago
Just a quick first reaction:
This specific error is from an address translation routine attempting to translate a "guess physical" address (i.e. an address that the VM thinks is a physical address) into a Snabb host address (i.e. the address inside the Snabb process memory map.) This involves taking an address provided by the guest (here it looks like a receive buffer address on a vring) and looking it up using a translation table that was provided by QEMU with a vhost-user protocol message.
The error is saying that the address translation failed. This could be due to a problem with the table or a problem with the address. Previously these issues have often been due to an obscure edge case or negotiated-feature-combination on the vring where the host is expecting one descriptor format and the guest is using another.
More thoughts to follow......
Agree, there is a failure in the gpa --> hva translation. In this specific case the gpa is passed on the transmit vring: the trace says so, and my guest application is really accessing the tx ving (see pkt-gen cmdline).
Does 0x2036a8b0
look to you like a valid address for the VM side? Just wondering whether the problem is with the translation or that the address is untranslatable (mangled somehow on the vring.)
Generally the biggest source of bugs in the Virtio-net code for us is the complexity around the many permutations of descriptor format options. The main test coverage we have for this is our DPDK+Configuration
test where we run the DPDK l2fwd application inside a VM and benchmark it in a test matrix that covers many DPDK versions and also several Virtio-net options (default vs without mrg_rxbuf vs without indirect descriptors.) However this is not exhaustive since there are more combinations of options and different guests (Linux, DPDK, netmap, etc) can all format descriptors in different ways.
One possible approach to troubleshooting would be to see how dependent the error is on the specific virtio-net options that are negotiated (if at all.) If you suspect the problem is with the mem table translation it could be worth adding some printouts to see what mapping Snabb has received in vhost_user.lua.
On guest side I don't even see GPAs, standard virtio functions do the GVA-->GPA translation, see https://github.com/luigirizzo/netmap/blob/master/LINUX/virtio_netmap.h#L420-L427
@vmaffione It would be very convenient to have a reproducible test case based on a NixOS guest. Then I could run it interactively and we could also have the CI include multiple netmap versions in its test matrix. Do you know if anybody has ever packaged netmap for NixOS? (Guessing it would be similar to the dpdk package.)
The simplest way to incorporate such a test case would be if the VM would have one virtio interface and simply retransmit every packet that it receives. Then we could reuse the existing test scaffolding that we use for running dpdk l2fwd in the guest.
I don't know NixOS, but netmap is just an out-of-tree kernel module with some example applications, that you build in the usual way
$ ./configure
$ make
$ make install
In any case, the very same setup works when user-space virtio or vhost-net (in kernel accelerator) are used as network backend. So I deduce it must be some translation case that is not managed by snabb. Maybe printing the snabb translation map will reveal the issue.
Is it similarly easy to run a program that loops back all received packets on a virtio-net port?
Yes. I just modified the netmap "bridge" example to be able to do that:
# bridge -L -i netmap:ethX
will do the trick (once I publish the update).
I can take a look at cooking up a reproducible test case if you can post for me source location and build instructions for all the relevant components. Just needs to be a stable location e.g. release tarball or a Git branch that won't be deleted or rebased.
Code is here https://github.com/luigirizzo/netmap.git , master branch. You can build and install it with
$ ./configure --drivers=virtio_net.c
$ make
$ sudo make install
But you need to make sure the configure system is able to find kernel sources, because the virtio_net driver (a kernel module) needs to be patched. You can use the --kernel-sources ./configure argument if necessary. Also, you need to make sure the modified virtio_net module is loaded rather than the original one.
Thanks, that sounds doable. I will try to make time for this during the week.
ok, thanks! I dont understand whether do you want to automate this or not (and why), but maybe it would be just enough to catch the bug :)
My view is that the host side of a Virtio-net interface is extremely complex. We have to deal with every combination of quirks and features desired by the guests. (The guests, in contrast, can just pick the features they like best and use those.) This level of complexity requires fully automated testing to avoid "whack-a-mole" situations where fixing one guest breaks another.
The goal here as I understand it is to support netmap/virtio-net guests on a snabbnfv host. The only valid way to do that from my perspective is add coverage to the CI. Otherwise we only know that Snabb master branch supported netmap master branch on date X.
(This is my view after ~ 5 years of dealing with the complicated moving target that is Virtio-net specification and guests from the device-side perspective.)
relatedly: the CI also provides hundreds of machine-hours of testing for each kind of guest and experience suggests that this is necessary for detecting certain classes of bugs. For example poll-mode drivers run at high frequencies and tend to make obscure race conditions more plausible, for example see this memory barrier error that affected both Snabb and DPDK on the host side but only showed up approximately once per hour of sustained traffic.
Sure, it sounds good. Regression tests are a powerful tool.
But I would like to point out that netmap is accessing the vring using the same API used by the Linux regular virtio-net driver, without renegotiating anything. So at least in principle there should not be any difference, that's why the failure is weird. In any case I will double check our virtio-net netmap code to make sure it is not circumventing the negotiated features.
@vmaffione Could you also tell me which guest kernel version(s) you have seen this problem with? One thing our CI currently lacks is to test many different Linux guests so the issue could be kernel version too.
Linux 4.10.6.
Just checking... does Snabb support the virtio-net header? If yes, does it support the 10 bytes one or the 12 bytes?
We support both and choose layout based on the features negotiated by the guest.
Some more information: the offending GPA (0x2036a8b0 in the trace above) corresponds to a global variable which is part of the (patched) virtio_net kernel module: https://github.com/luigirizzo/netmap/blob/master/LINUX/virtio_netmap.h#L357
And yes, the address is pushed on the RX vring.
However, I don't see what should be wrong with a static address vs a kmalloc()ed address..
Confirmed, if I use only kmalloc()ed buffers, Snabb does not crash anymore and everything works as expected.
To make it more clear, I was using two shared buffers to hold the virtio-net-header. Those buffers were allocated as global variables (e.g. in the .data section). This actually works for both virtio+vhost-net and virtio+vhost-user+ovs-dpdk. But it doesn't work for Snabb, so there must be a bug somewhere in Snabb.
If I alloc the shared buffers with kmalloc(), the problem disappears.
Good detective work. Could be an issue with multiple memory regions in the map. I have noticed in the past that guests have tended to serve all DMA from the same region and this does not exercise the code very well.
Hi, I have two QEMU VMs running a simple vm2vm AppEngine which implements a trivial point-to-point link:
Everything works correctly if I use the regular network stack in the guest (e.g. ping, netperf, etc.). However, if I try to use netmap on the virtio-net interface inside the guest
snabb immediately crashes:
The only thing that changes w.r.t. the working setup is that buffers published in the virtio ring do not belong to skbuffs, but are provided by the netmap module (running inside the guest).