Closed rbtcollins closed 8 years ago
Hi, I think I see what's happening.
It seems that you are using the hyperv netvsc paravirtualized network adapter, for which the native netmap adapter does not exist, so netmap falls back to the generic netmap adapter (which is fine).
Now, for each packet netmap wants to transmit, the netvsc_start_xmit() routine in the netvsc driver is called, passing a sk_buff object - i.e. the packet to be transmitted. This routine calls the skb_cow_head(sk_buff) function, which ends up in calling pskb_expand_head(sk_buff). This latter function has an assertion, checking that skb_shared(sk_buff) is false - e.g. there are no other entities holding a reference to the sk_buff other than the driver.
Unfortunately, this assertion fails, since the sk_buff() is shared. As a matter of facts, the netmap generic adapter increments the reference counter for each sk_buff to be transmitted, before invoking the driver transmission routine. This is done for performance reasons.
Since the same technique is also used by the Linux in-kernel packet generator (see net/core/pktgen.c)
atomic_add(burst, &skb->users)
I would be inclined to conclude that the netvsc driver should not call skb_cow_head() - so the bug is there. To prove this, can you try to run the linux pktgen on your VM netvsc interface, to see if a crash happens?
I have a script in the netmap repo (LINUX/scripts/linux-pktgen.sh) which is easy to use. Just open it, change the IF variable to the name of your netvsc VM interface, and run.
I think thats the driver I already tested on:
ethtool -i eth0
driver: hv_netvsc
However, I will give it a go :)
sudo bash ./linux-pktgen.sh
[sudo] password for robertc:
Removing all devices (0)
Removing all devices (1)
./linux-pktgen.sh: line 10: /proc/net/pktgen/kpktgend_1: No such file or directory
cat: /proc/net/pktgen/kpktgend_1: No such file or directory
cat: /proc/net/pktgen/kpktgend_1: No such file or directory
Configuring /proc/net/pktgen/kpktgend_0
Adding eth0@0
Configuring /proc/net/pktgen/eth0@0
Running... Ctrl-C to stop
I presume the no such file or directory implies a missing module; looking into that now.
However,
Module Size Used by
pktgen 53248 0
ENOTSURE :)
Ah, looks like the CPU count was wrong, adjusted that locally too, now no errors. Doesn't crash. Doesn't output any transmission stats either though:
$ sudo bash ./linux-pktgen.sh
Removing all devices (0)
Configuring /proc/net/pktgen/kpktgend_0
Adding eth0@0
Configuring /proc/net/pktgen/eth0@0
Running... Ctrl-C to stop
^C
~/personal/netmap/LINUX/scripts$
sudo cat /proc/net/pktgen/eth0@0
Params: count 0 min_pkt_size: 60 max_pkt_size: 60
frags: 0 delay: 0 clone_skb: 0 ifname: eth0@0
flows: 0 flowlen: 0
queue_map_min: 0 queue_map_max: 0
dst_min: 10.216.8.1 dst_max:
src_min: src_max:
src_mac: 00:15:5d:ba:c3:01 dst_mac: ff:ff:ff:ff:ff:ff
udp_src_min: 9 udp_src_max: 9 udp_dst_min: 9 udp_dst_max: 9
src_mac_count: 0 dst_mac_count: 0
Flags: QUEUE_MAP_CPU
Current:
pkts-sofar: 638132 errors: 0
started: 20477152217us stopped: 20482580170us idle: 34004us
seq_num: 638133 cur_dst_mac_offset: 0 cur_src_mac_offset: 0
cur_saddr: 192.168.137.75 cur_daddr: 10.216.8.1
cur_udp_dst: 9 cur_udp_src: 9
cur_queue_map: 0
flows: 0
Result: OK: 5427952(c5393948+d34004) usec, 638132 (60byte,0frags)
117564pps 56Mb/sec (56430720bps) errors: 0
Thank you for your tests.
Can you try to play with the CLONE_SKB parameter, to see if using something like 100 there causes crashes? Also are you sure pktgen is really transmitting? Can you see the traffic flowing outside the VM (e.g. on another VM on the same host, or in some windows network statistics)?
I can see 50Mbps of traffic in taskmanager, so I'm sure something is being emitted onto the vmbus - starts and stops when I start linux-pktgen.
CLONE_SKB=1 -> still works, no crash.
CLONE_SKB=2 -> still works, no crash.
I noted that the kernel pktgen's use of atomic_add is guarded by interface mode.
diff --git a/LINUX/scripts/linux-pktgen.sh b/LINUX/scripts/linux-pktgen.sh
index 5186ada..77f3765 100755
--- a/LINUX/scripts/linux-pktgen.sh
+++ b/LINUX/scripts/linux-pktgen.sh
@@ -30,6 +30,7 @@ PKT_SIZE="60" # packet size
PKT_COUNT="0" # number of packets to send (0 means an infinite number)
CLONE_SKB="0" # how many times a sk_buff is recycled (0 means always use the same skbuff)
BURST_LEN="1" # burst-size (xmit_more skb flag)
+XMIT_MODE="netif_receive" # Transmit mode. start_xmit to put on wire, netif_receive to put into kernel stack
# Load pktgen kernel module
@@ -63,6 +64,7 @@ for cpu in ${IDX}; do
pgset "src_mac $SRC_MAC"
pgset "dst $DST_IP"
pgset "dst_mac $DST_MAC"
+ pgset "xmit_mode $XMIT_MODE"
pgset "flag QUEUE_MAP_CPU"
echo ""
Let me put the generator into that mode, no crash.
$ sudo cat /proc/net/pktgen/eth0@0
Params: count 0 min_pkt_size: 60 max_pkt_size: 60
frags: 0 delay: 0 clone_skb: 0 ifname: eth0@0
flows: 0 flowlen: 0
queue_map_min: 0 queue_map_max: 0
dst_min: 10.216.8.1 dst_max:
src_min: src_max:
src_mac: 00:15:5d:ba:c3:01 dst_mac: ff:ff:ff:ff:ff:ff
udp_src_min: 9 udp_src_max: 9 udp_dst_min: 9 udp_dst_max: 9
src_mac_count: 0 dst_mac_count: 0
xmit_mode: netif_receive
Flags: QUEUE_MAP_CPU
Current:
pkts-sofar: 2556181 errors: 2556181
started: 339910716444us stopped: 339913735894us idle: 6373us
seq_num: 2556182 cur_dst_mac_offset: 0 cur_src_mac_offset: 0
cur_saddr: 192.168.137.75 cur_daddr: 10.216.8.1
cur_udp_dst: 9 cur_udp_src: 9
cur_queue_map: 0
flows: 0
Result: OK: 3019450(c3013076+d6373) usec, 2556181 (60byte,0frags)
846571pps 406Mb/sec (406354080bps) errors: 2556181
Sorry, missed how high you wanted it - CLONE_SKB=200 no crash after 10 seconds
Difference seems to me to be that the in-kernel pktgen calls netdev_start_xmit, which calls ops->ndo_start_xmit vs netmap calling dev_queue_xmit. Perhaps validate_xmit_skb (which btw already does linearize, so netmap perhaps doesn't need to do that) should be cloning the skb for hv_netsvc?
I hope you don't mind me using this to journal what I've figured out :).
I added some printk statements.
[76944.121996] pkt_trans before: 1ll
[76944.121996] pkt_trans after : 2ll
[76944.121998] pkt_trans odev netvsc_start_xmit+0x0/0x980 [hv_netvsc]
[76944.121998] netsvc_drv before: 2ll
That is, in pkt_trans, we increase skb->users to 2, the device netops we're about to call is the hyperv driver rather than some layer driver or some such, and the driver is not erroring when receiving an skbuff with users == 2.
diff --git a/drivers/net/hyperv/netvsc_drv.c b/drivers/net/hyperv/netvsc_drv.c
index 953e101..c3c66fc 100644
--- a/drivers/net/hyperv/netvsc_drv.c
+++ b/drivers/net/hyperv/netvsc_drv.c
@@ -407,6 +407,7 @@ check_size:
pkt_sz = sizeof(struct hv_netvsc_packet) + RNDIS_AND_PPI_SIZE;
+ printk("netsvc_drv before: %ull\n", atomic_read(&skb->users));
ret = skb_cow_head(skb, pkt_sz);
if (ret) {
netdev_err(net, "unable to alloc hv_netvsc_packet\n");
diff --git a/net/core/pktgen.c b/net/core/pktgen.c
index 4da4d51..a4f6547 100644
--- a/net/core/pktgen.c
+++ b/net/core/pktgen.c
@@ -3447,9 +3447,12 @@ static void pktgen_xmit(struct pktgen_dev *pkt_dev)
pkt_dev->last_ok = 0;
goto unlock;
}
+ printk("pkt_trans before: %ull\n", atomic_read(&pkt_dev->skb->users));
atomic_add(burst, &pkt_dev->skb->users);
+ printk("pkt_trans after : %ull\n", atomic_read(&pkt_dev->skb->users));
xmit_more:
+ printk("pkt_trans odev %pF\n", odev->netdev_ops->ndo_start_xmit);
ret = netdev_start_xmit(pkt_dev->skb, odev, txq, --burst > 0);
switch (ret) {
So, just having an skbuff with count 2 isn't enough to trip the panic.
Next iteration:
+ int delta = pkt_sz - skb_headroom(skb);
+ printk("netsvc_drv before: %ull %i\n", atomic_read(&skb->users), delta);
in the driver, to see if we trigger the expand_headroom codepath; and we should not be the requested headroom is 30 bytes less than the available headroom:
netsvc_drv before: 1ll -30
So - the in-kernel generator is not crashing because there is enough headroom in the skb's it creates to put the hyperv vmbus connection metadata in-place already. So this explains why the in-kernel generator is not crashing. We can either modify netmap's generic driver to include that headroom, find a different call for hyperv's driver to use (I'm not familiar with the skbuff api yet), or change skb_cow_head... I think :)
Ah, found it, I think.
nm_os_generic_xmit_frame(struct nm_os_gen_arg *a)
{
struct mbuf *m = a->m;
u_int len = a->len;
netdev_tx_t ret;
/* Empty the mbuf. */
if (unlikely(skb_headroom(m)))
skb_push(m, skb_headroom(m));
This is ignoring the dev->minimum_headroom parameter. I'll put together a patch tomorrowish.
So if I get it correctly, it seems that the following check in __skb_cow function:
int delta = 0;
if (headroom > skb_headroom(skb))
delta = headroom - skb_headroom(skb);
is true when you use netmap and false when you use in-kernel pktgen. Consequently, only in the netmap case the pskb_expand_head() is called, and then the crash happens.
As you point out, the ultimate cause of this is that netmap generic driver incorrectly assumes the net_device::needed_headroom is always zero. Thanks for your debugging!
I'll try to find the best solution and come back to you with a patch.
Can you try the attached patch? I've tested it with ixgbe, realtek 8169, virtio-net, e1000, using the generic adapter, and it works.
Yes, thats right, the check doesn't get hit and so we don't panic. I'm going to file a bug with the hv_netvsc driver once I get hold of the MS folk that maintain it :). For now though, we really should honour needed_headroom - I put a patch of my own up for that yesterday - https://github.com/luigirizzo/netmap/pull/182 - I'll give your patch a go today.
Output from your patch:
$ sudo ./pkt-gen -i eth0 -f tx -d 192.168.1.2:80 -s 192.168.137.75:1023 -n 500
288.345411 main [2234] interface is eth0
288.345647 main [2354] running on 1 cpus (have 1)
288.345921 extract_ip_range [364] range is 192.168.137.75:1023 to 192.168.137.75:1023
288.346005 extract_ip_range [364] range is 192.168.1.2:80 to 192.168.1.2:80
288.394745 main [2455] mapped 334980KB at 0x7fbcac636000
Sending on netmap:eth0: 1 queues, 1 threads and 1 cpus.
192.168.137.75 -> 192.168.1.2 (00:00:00:00:00:00 -> ff:ff:ff:ff:ff:ff)
288.395970 main [2552] Sending 512 packets every 0.000000000 s
288.396011 main [2554] Wait 2 secs for phy reset
290.396252 main [2556] Ready...
290.396493 sender_body [1175] start, fd 3 main_fd 3
290.396728 sender_body [1290] flush tail 1023 head 500 on thread 0x7fbcac635700
291.397816 main_thread [2019] 499.000 pps (500.000 pkts 240.000 Kbps in 1001330 usec) 500.00 avg_batch 0 min_space
Sent 500 packets 30000 bytes 1 events 60 bytes each in 0.01 seconds.
Speed: 44.647 Kpps Bandwidth: 21.430 Mbps (raw 30.003 Mbps). Average batch: 500.00 pkts
but your skb protocol pointers are wrong:
[56358.771850] protocol 0000 is buggy, dev eth0
output from my patch
$ sudo ./pkt-gen -i eth0 -f tx -d 192.168.1.2:80 -s 192.168.137.75:1023 -n 500
449.421605 main [2234] interface is eth0
449.421771 main [2354] running on 1 cpus (have 1)
449.421981 extract_ip_range [364] range is 192.168.137.75:1023 to 192.168.137.75:1023
449.422042 extract_ip_range [364] range is 192.168.1.2:80 to 192.168.1.2:80
449.474254 main [2455] mapped 334980KB at 0x7f5a09a6b000
Sending on netmap:eth0: 1 queues, 1 threads and 1 cpus.
192.168.137.75 -> 192.168.1.2 (00:00:00:00:00:00 -> ff:ff:ff:ff:ff:ff)
449.474456 main [2552] Sending 512 packets every 0.000000000 s
449.474488 main [2554] Wait 2 secs for phy reset
451.474677 main [2556] Ready...
451.475123 sender_body [1175] start, fd 3 main_fd 3
451.475923 sender_body [1290] flush tail 1023 head 500 on thread 0x7f5a09a6a700
452.476590 main_thread [2019] 499.000 pps (500.000 pkts 240.000 Kbps in 1001483 usec) 500.00 avg_batch 0 min_space
Sent 500 packets 30000 bytes 1 events 60 bytes each in 0.00 seconds.
Speed: 232.883 Kpps Bandwidth: 111.784 Mbps (raw 156.497 Mbps). Average batch: 500.00 pkts
It does maintain the protocol pointers correctly, which is a theoretical speed cost, but since all those methods are inline static, the compile may well turn it into ~nothing... certainly way less than the basic variance between runs - I've seen your patch as low as 32Kpps and as high as 740Kpps within this VM with all the same parameters, and mine likewise.
I've tested https://github.com/luigirizzo/netmap/pull/182 too now, once I set the src and dst MAC I can see its traffic externally.
Hi, I've seen #182 , but I'm trying to have a less redundant patch. As you see, skb_push, skb_trim, skb_reserve, etc, it's too much for a simple work and very hard to read. Also, incrementing the length by LL_RESERVED_SPACE is not optimal, because you are counting the space for Ethernet header twice.
Moreover, consider that the generic driver may be used with very fast hardware. In my testbed I've ixgbe cards, with which I get up to 3.9 Mpps with the generic adapter.
But you are right, we miss the call to skb_reset_network_header. The other ones are not needed, as I see in dev_queue_xmit_nit(). Updated the patch with your suggestions, can you please give a try?
Hmm, LL_RESERVED_SPACE doesn't allow for ethernet header twice? Or do you mean that because netmap allows for the ethernet header, that results in use accounting for it twice? If so yes, that macro makes more sense for L3 protocols.
That said, your new patch looks like it will avoid the protocol errors while maintaining the desired headroom - I'll give it a spin later to be sure, but broadly +1 from me :)
Seems ok :)
Exactly, a netmap buffer contains an Ethernet frame, so it has already space for the L2 header (i.e. 'len' already takes into account this). As a result, it does not make sense to account for it again through LL_RESERVED_SPACE. We just need to add headroom and tailroom.
Ok, so if your tests are ok I will merge the patch. Thanks for all your testing and tweaking! :)
Btw, it would be nice to have a native adapter for the hyperv paravirtualized NIC, similarly to the adapter we have for virtio-net. Even better, it would be nice to have hyperv support for netmap passthrough networking, similar to what we have for linux (LINUX/ptnet/ptnet.c) and soon for FreeBSD.
Hi, I wanted to setup a development environment for a netmap using project, within my Linux VM - its an Ubuntu 16.04 VM running under hyper-v generation 2.
I checked out git (8123c7), ran ./configure --no-drivers, then cd'd to LINUX and ran make, insmod netmap.ko, cd'd to examples and ran make.
Then, running sudo ./pkg-gen -i eth0 -f ping results in a kernel panic once the phy reset completes.
Full dmesg output from the point I loaded netmap.