Kernel panic running pkt-gen on Linux 4.4 with generic drivers

rbtcollins commented 8 years ago

Hi, I wanted to setup a development environment for a netmap using project, within my Linux VM - its an Ubuntu 16.04 VM running under hyper-v generation 2.

I checked out git (8123c7), ran ./configure --no-drivers, then cd'd to LINUX and ran make, insmod netmap.ko, cd'd to examples and ran make.

Then, running sudo ./pkg-gen -i eth0 -f ping results in a kernel panic once the phy reset completes.

Full dmesg output from the point I loaded netmap.

[  185.313723] netmap: module verification failed: signature and/or required key missing - tainting kernel
[  185.315484] 750.467721 [3255] netmap_init               run mknod /dev/netmap c 10 53 # error 0
[  185.315843] netmap: loaded module
[  207.096641] 772.249375 [2013] netmap_do_regif           lut ffffc90001017000 bufs 163840 size 2048
[  207.097373] 772.250103 [ 379] generic_netmap_register   Generic adapter ffff8801f6e3f000 goes on
[  207.098044] 772.250773 [ 430] generic_netmap_register   RX ring 0 of generic adapter ffff8801f6e3f000 goes on
[  207.098697] 772.251423 [ 440] generic_netmap_register   TX ring 0 of generic adapter ffff8801f6e3f000 goes on
[  207.101306] 772.254038 [ 455] generic_qdisc_init        Qdisc #0 initialized with max_len = 1024
[  209.107563] ------------[ cut here ]------------
[  209.108869] kernel BUG at /build/linux-FvcHlK/linux-4.4.0/net/core/skbuff.c:1201!
[  209.109024] invalid opcode: 0000 [#1] SMP
[  209.109024] Modules linked in: netmap(OE) xt_CHECKSUM iptable_mangle ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_conntrack_ipv4 nf_defrag_ipv4 nf_nat_ipv4 nf_nat nf_conntrack xt_tcpudp bridge stp llc iptable_filter ip_tables x_tables nls_iso8859_1 crct10dif_pclmul crc32_pclmul aesni_intel aes_x86_64 lrw gf128mul glue_helper ablk_helper cryptd serio_raw hyperv_fb hv_balloon joydev mac_hid autofs4 hid_generic hid_hyperv hv_netvsc hv_storvsc hv_utils hyperv_keyboard hid hv_vmbus
[  209.109024] CPU: 0 PID: 3633 Comm: pkt-gen Tainted: G           OE   4.4.0-22-generic #40-Ubuntu
[  209.109024] Hardware name: Microsoft Corporation Virtual Machine/Virtual Machine, BIOS Hyper-V UEFI Release v1.0 11/26/2012
[  209.109024] task: ffff8801f63d6040 ti: ffff8801f5898000 task.ti: ffff8801f5898000
[  209.109024] RIP: 0010:[<ffffffff81709673>]  [<ffffffff81709673>] pskb_expand_head+0x243/0x250
[  209.109024] RSP: 0018:ffff8801f589b4c8  EFLAGS: 00010202
[  209.109024] RAX: 0000000000000002 RBX: ffff8801f9559700 RCX: 0000000002080020
[  209.109024] RDX: 0000000000000fc0 RSI: 0000000000000100 RDI: ffff8801f9559700
[  209.109024] RBP: ffff8801f589b500 R08: 000000000001a000 R09: 0000000000000001
[  209.109024] R10: ffff8800041c9ec0 R11: 0000000000000000 R12: ffff8800041c9000
[  209.109024] R13: 0000000000000ec0 R14: 0000000000000000 R15: ffff8801f9559700
[  209.109024] FS:  00007fd8f1ea4700(0000) GS:ffff8801fb400000(0000) knlGS:0000000000000000
[  209.109024] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  209.109024] CR2: 00007fd8f29c7000 CR3: 00000001f73cd000 CR4: 00000000003406f0
[  209.109024] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  209.109024] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[  209.109024] Stack:
[  209.109024]  ffff8801f9558800 0000000000000000 ffff8801f9559700 ffff8800041c9000
[  209.109024]  0000000000000ec0 0000000000000000 0000000000000000 ffff8801f589b778
[  209.109024]  ffffffffc00793e0 ffff8801f589b718 0000000000000000 0000000000000000
[  209.109024] Call Trace:
[  209.109024]  [<ffffffffc00793e0>] netvsc_start_xmit+0x560/0x950 [hv_netvsc]
[  209.109024]  [<ffffffff811ea6ff>] ? get_partial_node.isra.61+0xdf/0x200
[  209.109024]  [<ffffffff8118ff05>] ? mempool_alloc_slab+0x15/0x20
[  209.109024]  [<ffffffff8118ff05>] ? mempool_alloc_slab+0x15/0x20
[  209.109024]  [<ffffffff8118ffee>] ? mempool_alloc+0x6e/0x170
[  209.109024]  [<ffffffff8118ff05>] ? mempool_alloc_slab+0x15/0x20
[  209.109024]  [<ffffffff813fa73a>] ? sg_init_table+0x1a/0x40
[  209.109024]  [<ffffffff811ade4c>] ? zone_statistics+0x7c/0xa0
[  209.109024]  [<ffffffff81705e19>] ? kfree_skbmem+0x59/0x60
[  209.109024]  [<ffffffff81707014>] ? consume_skb+0x34/0x90
[  209.109024]  [<ffffffff81809434>] ? packet_rcv+0x44/0x440
[  209.109024]  [<ffffffffc01a8c4f>] generic_ndo_start_xmit+0x2f/0x40 [netmap]
[  209.109024]  [<ffffffff8171d5e9>] dev_hard_start_xmit+0x249/0x3d0
[  209.109024]  [<ffffffff817420ff>] sch_direct_xmit+0x12f/0x210
[  209.109024]  [<ffffffff817422e3>] __qdisc_run+0x103/0x1a0
[  209.109024]  [<ffffffff8171db8d>] __dev_queue_xmit+0x2dd/0x590
[  209.109024]  [<ffffffff8171de50>] dev_queue_xmit+0x10/0x20
[  209.109024]  [<ffffffffc01a9ab3>] nm_os_generic_xmit_frame+0xb3/0x190 [netmap]
[  209.109024]  [<ffffffffc01a2402>] generic_netmap_txsync+0x222/0x4e0 [netmap]
[  209.109024]  [<ffffffffc01a6d26>] netmap_poll+0x916/0x940 [netmap]
[  209.109024]  [<ffffffff810f482a>] ? ktime_get_ts64+0x4a/0xf0
[  209.109024]  [<ffffffffc01a90df>] linux_netmap_poll+0x3f/0x60 [netmap]
[  209.109024]  [<ffffffff812222f4>] do_sys_poll+0x324/0x560
[  209.109024]  [<ffffffff81196a41>] ? __alloc_pages_nodemask+0x1b1/0xb60
[  209.109024]  [<ffffffff812507f4>] ? fsnotify_destroy_marks+0x64/0x80
[  209.109024]  [<ffffffff81220c00>] ? poll_initwait+0x50/0x50
[  209.109024]  [<ffffffff812798dc>] ? proc_destroy_inode+0x1c/0x20
[  209.109024]  [<ffffffff810b4b83>] ? update_curr+0xe3/0x160
[  209.109024]  [<ffffffff81823172>] ? mutex_lock+0x12/0x30
[  209.109024]  [<ffffffff8123a623>] ? locked_inode_to_wb_and_lock_list+0x53/0x100
[  209.109024]  [<ffffffff8123b115>] ? __mark_inode_dirty+0x2d5/0x380
[  209.109024]  [<ffffffff81227729>] ? generic_update_time+0x79/0xd0
[  209.109024]  [<ffffffff8122d796>] ? __mnt_want_write+0x56/0x60
[  209.109024]  [<ffffffff812278c9>] ? file_update_time+0xc9/0x110
[  209.109024]  [<ffffffff811bf1e0>] ? handle_mm_fault+0x540/0x1820
[  209.109024]  [<ffffffff81821205>] ? schedule+0x35/0x80
[  209.109024]  [<ffffffff810f482a>] ? ktime_get_ts64+0x4a/0xf0
[  209.109024]  [<ffffffff81222621>] SyS_poll+0x71/0x130
[  209.109024]  [<ffffffff818252f2>] entry_SYSCALL_64_fastpath+0x16/0x71
[  209.109024] Code: 75 1a 41 8b 87 cc 00 00 00 49 03 87 d0 00 00 00 e9 e2 fe ff ff b8 f4 ff ff ff eb bc 4c 89 ef e8 d4 25 ae ff b8 f4 ff ff ff eb ad <0f> 0b 90 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 55 48 89
[  209.109024] RIP  [<ffffffff81709673>] pskb_expand_head+0x243/0x250
[  209.109024]  RSP <ffff8801f589b4c8>

vmaffione commented 8 years ago

Hi, I think I see what's happening.

It seems that you are using the hyperv netvsc paravirtualized network adapter, for which the native netmap adapter does not exist, so netmap falls back to the generic netmap adapter (which is fine).

Now, for each packet netmap wants to transmit, the netvsc_start_xmit() routine in the netvsc driver is called, passing a sk_buff object - i.e. the packet to be transmitted. This routine calls the skb_cow_head(sk_buff) function, which ends up in calling pskb_expand_head(sk_buff). This latter function has an assertion, checking that skb_shared(sk_buff) is false - e.g. there are no other entities holding a reference to the sk_buff other than the driver.

Unfortunately, this assertion fails, since the sk_buff() is shared. As a matter of facts, the netmap generic adapter increments the reference counter for each sk_buff to be transmitted, before invoking the driver transmission routine. This is done for performance reasons.

Since the same technique is also used by the Linux in-kernel packet generator (see net/core/pktgen.c)

atomic_add(burst, &skb->users)

I would be inclined to conclude that the netvsc driver should not call skb_cow_head() - so the bug is there. To prove this, can you try to run the linux pktgen on your VM netvsc interface, to see if a crash happens?

I have a script in the netmap repo (LINUX/scripts/linux-pktgen.sh) which is easy to use. Just open it, change the IF variable to the name of your netvsc VM interface, and run.

rbtcollins commented 8 years ago

I think thats the driver I already tested on:

ethtool -i eth0
driver: hv_netvsc

However, I will give it a go :)

rbtcollins commented 8 years ago

sudo bash ./linux-pktgen.sh
[sudo] password for robertc:
Removing all devices (0)
Removing all devices (1)
./linux-pktgen.sh: line 10: /proc/net/pktgen/kpktgend_1: No such file or directory
cat: /proc/net/pktgen/kpktgend_1: No such file or directory
cat: /proc/net/pktgen/kpktgend_1: No such file or directory
Configuring /proc/net/pktgen/kpktgend_0
Adding eth0@0
Configuring /proc/net/pktgen/eth0@0

Running... Ctrl-C to stop

I presume the no such file or directory implies a missing module; looking into that now.

However,

Module                  Size  Used by
pktgen                 53248  0

ENOTSURE :)

rbtcollins commented 8 years ago

Ah, looks like the CPU count was wrong, adjusted that locally too, now no errors. Doesn't crash. Doesn't output any transmission stats either though:

$ sudo bash ./linux-pktgen.sh
Removing all devices (0)
Configuring /proc/net/pktgen/kpktgend_0
Adding eth0@0
Configuring /proc/net/pktgen/eth0@0

Running... Ctrl-C to stop
^C
~/personal/netmap/LINUX/scripts$

rbtcollins commented 8 years ago

sudo cat /proc/net/pktgen/eth0@0
Params: count 0  min_pkt_size: 60  max_pkt_size: 60
     frags: 0  delay: 0  clone_skb: 0  ifname: eth0@0
     flows: 0 flowlen: 0
     queue_map_min: 0  queue_map_max: 0
     dst_min: 10.216.8.1  dst_max:
     src_min:   src_max:
     src_mac: 00:15:5d:ba:c3:01 dst_mac: ff:ff:ff:ff:ff:ff
     udp_src_min: 9  udp_src_max: 9  udp_dst_min: 9  udp_dst_max: 9
     src_mac_count: 0  dst_mac_count: 0
     Flags: QUEUE_MAP_CPU
Current:
     pkts-sofar: 638132  errors: 0
     started: 20477152217us  stopped: 20482580170us idle: 34004us
     seq_num: 638133  cur_dst_mac_offset: 0  cur_src_mac_offset: 0
     cur_saddr: 192.168.137.75  cur_daddr: 10.216.8.1
     cur_udp_dst: 9  cur_udp_src: 9
     cur_queue_map: 0
     flows: 0
Result: OK: 5427952(c5393948+d34004) usec, 638132 (60byte,0frags)
  117564pps 56Mb/sec (56430720bps) errors: 0

vmaffione commented 8 years ago

Thank you for your tests.

Can you try to play with the CLONE_SKB parameter, to see if using something like 100 there causes crashes? Also are you sure pktgen is really transmitting? Can you see the traffic flowing outside the VM (e.g. on another VM on the same host, or in some windows network statistics)?

rbtcollins commented 8 years ago

I can see 50Mbps of traffic in taskmanager, so I'm sure something is being emitted onto the vmbus - starts and stops when I start linux-pktgen.

CLONE_SKB=1 -> still works, no crash.

rbtcollins commented 8 years ago

CLONE_SKB=2 -> still works, no crash.

rbtcollins commented 8 years ago

I noted that the kernel pktgen's use of atomic_add is guarded by interface mode.

diff --git a/LINUX/scripts/linux-pktgen.sh b/LINUX/scripts/linux-pktgen.sh
index 5186ada..77f3765 100755
--- a/LINUX/scripts/linux-pktgen.sh
+++ b/LINUX/scripts/linux-pktgen.sh
@@ -30,6 +30,7 @@ PKT_SIZE="60"                   # packet size
 PKT_COUNT="0"            # number of packets to send (0 means an infinite number)
 CLONE_SKB="0"               # how many times a sk_buff is recycled (0 means always use the same skbuff)
 BURST_LEN="1"          # burst-size (xmit_more skb flag)
+XMIT_MODE="netif_receive"       # Transmit mode. start_xmit to put on wire, netif_receive to put into kernel stack

 # Load pktgen kernel module
@@ -63,6 +64,7 @@ for cpu in ${IDX}; do
     pgset "src_mac $SRC_MAC"
     pgset "dst $DST_IP"
     pgset "dst_mac $DST_MAC"
+    pgset "xmit_mode $XMIT_MODE"
     pgset "flag QUEUE_MAP_CPU"

     echo ""

Let me put the generator into that mode, no crash.

$ sudo cat /proc/net/pktgen/eth0@0
Params: count 0  min_pkt_size: 60  max_pkt_size: 60
     frags: 0  delay: 0  clone_skb: 0  ifname: eth0@0
     flows: 0 flowlen: 0
     queue_map_min: 0  queue_map_max: 0
     dst_min: 10.216.8.1  dst_max:
     src_min:   src_max:
     src_mac: 00:15:5d:ba:c3:01 dst_mac: ff:ff:ff:ff:ff:ff
     udp_src_min: 9  udp_src_max: 9  udp_dst_min: 9  udp_dst_max: 9
     src_mac_count: 0  dst_mac_count: 0
     xmit_mode: netif_receive
     Flags: QUEUE_MAP_CPU
Current:
     pkts-sofar: 2556181  errors: 2556181
     started: 339910716444us  stopped: 339913735894us idle: 6373us
     seq_num: 2556182  cur_dst_mac_offset: 0  cur_src_mac_offset: 0
     cur_saddr: 192.168.137.75  cur_daddr: 10.216.8.1
     cur_udp_dst: 9  cur_udp_src: 9
     cur_queue_map: 0
     flows: 0
Result: OK: 3019450(c3013076+d6373) usec, 2556181 (60byte,0frags)
  846571pps 406Mb/sec (406354080bps) errors: 2556181

rbtcollins commented 8 years ago

Sorry, missed how high you wanted it - CLONE_SKB=200 no crash after 10 seconds

rbtcollins commented 8 years ago

Difference seems to me to be that the in-kernel pktgen calls netdev_start_xmit, which calls ops->ndo_start_xmit vs netmap calling dev_queue_xmit. Perhaps validate_xmit_skb (which btw already does linearize, so netmap perhaps doesn't need to do that) should be cloning the skb for hv_netsvc?

rbtcollins commented 8 years ago

I hope you don't mind me using this to journal what I've figured out :).

I added some printk statements.

[76944.121996] pkt_trans before: 1ll
[76944.121996] pkt_trans after : 2ll
[76944.121998] pkt_trans odev netvsc_start_xmit+0x0/0x980 [hv_netvsc]
[76944.121998] netsvc_drv before: 2ll

That is, in pkt_trans, we increase skb->users to 2, the device netops we're about to call is the hyperv driver rather than some layer driver or some such, and the driver is not erroring when receiving an skbuff with users == 2.

diff --git a/drivers/net/hyperv/netvsc_drv.c b/drivers/net/hyperv/netvsc_drv.c
index 953e101..c3c66fc 100644
--- a/drivers/net/hyperv/netvsc_drv.c
+++ b/drivers/net/hyperv/netvsc_drv.c
@@ -407,6 +407,7 @@ check_size:

        pkt_sz = sizeof(struct hv_netvsc_packet) + RNDIS_AND_PPI_SIZE;

+       printk("netsvc_drv before: %ull\n", atomic_read(&skb->users));
        ret = skb_cow_head(skb, pkt_sz);
        if (ret) {
                netdev_err(net, "unable to alloc hv_netvsc_packet\n");
diff --git a/net/core/pktgen.c b/net/core/pktgen.c
index 4da4d51..a4f6547 100644
--- a/net/core/pktgen.c
+++ b/net/core/pktgen.c
@@ -3447,9 +3447,12 @@ static void pktgen_xmit(struct pktgen_dev *pkt_dev)
                pkt_dev->last_ok = 0;
                goto unlock;
        }
+       printk("pkt_trans before: %ull\n", atomic_read(&pkt_dev->skb->users));
        atomic_add(burst, &pkt_dev->skb->users);
+       printk("pkt_trans after : %ull\n", atomic_read(&pkt_dev->skb->users));

 xmit_more:
+       printk("pkt_trans odev %pF\n", odev->netdev_ops->ndo_start_xmit);
        ret = netdev_start_xmit(pkt_dev->skb, odev, txq, --burst > 0);

        switch (ret) {

So, just having an skbuff with count 2 isn't enough to trip the panic.

rbtcollins commented 8 years ago

Next iteration:

+       int delta = pkt_sz - skb_headroom(skb);
+       printk("netsvc_drv before: %ull %i\n", atomic_read(&skb->users), delta);

in the driver, to see if we trigger the expand_headroom codepath; and we should not be the requested headroom is 30 bytes less than the available headroom:

netsvc_drv before: 1ll -30

So - the in-kernel generator is not crashing because there is enough headroom in the skb's it creates to put the hyperv vmbus connection metadata in-place already. So this explains why the in-kernel generator is not crashing. We can either modify netmap's generic driver to include that headroom, find a different call for hyperv's driver to use (I'm not familiar with the skbuff api yet), or change skb_cow_head... I think :)

rbtcollins commented 8 years ago

Ah, found it, I think.

nm_os_generic_xmit_frame(struct nm_os_gen_arg *a)
{
        struct mbuf *m = a->m;
        u_int len = a->len;
        netdev_tx_t ret;

        /* Empty the mbuf. */
        if (unlikely(skb_headroom(m)))
                skb_push(m, skb_headroom(m));

This is ignoring the dev->minimum_headroom parameter. I'll put together a patch tomorrowish.

vmaffione commented 8 years ago

So if I get it correctly, it seems that the following check in __skb_cow function:

int delta = 0;

if (headroom > skb_headroom(skb))
        delta = headroom - skb_headroom(skb);

is true when you use netmap and false when you use in-kernel pktgen. Consequently, only in the netmap case the pskb_expand_head() is called, and then the crash happens.

As you point out, the ultimate cause of this is that netmap generic driver incorrectly assumes the net_device::needed_headroom is always zero. Thanks for your debugging!

I'll try to find the best solution and come back to you with a patch.

vmaffione commented 8 years ago

Can you try the attached patch? I've tested it with ixgbe, realtek 8169, virtio-net, e1000, using the generic adapter, and it works.

headroom.patch.txt

rbtcollins commented 8 years ago

Yes, thats right, the check doesn't get hit and so we don't panic. I'm going to file a bug with the hv_netvsc driver once I get hold of the MS folk that maintain it :). For now though, we really should honour needed_headroom - I put a patch of my own up for that yesterday - https://github.com/luigirizzo/netmap/pull/182 - I'll give your patch a go today.

rbtcollins commented 8 years ago

Output from your patch:

$ sudo ./pkt-gen -i eth0 -f tx -d 192.168.1.2:80 -s 192.168.137.75:1023 -n 500
288.345411 main [2234] interface is eth0
288.345647 main [2354] running on 1 cpus (have 1)
288.345921 extract_ip_range [364] range is 192.168.137.75:1023 to 192.168.137.75:1023
288.346005 extract_ip_range [364] range is 192.168.1.2:80 to 192.168.1.2:80
288.394745 main [2455] mapped 334980KB at 0x7fbcac636000
Sending on netmap:eth0: 1 queues, 1 threads and 1 cpus.
192.168.137.75 -> 192.168.1.2 (00:00:00:00:00:00 -> ff:ff:ff:ff:ff:ff)
288.395970 main [2552] Sending 512 packets every  0.000000000 s
288.396011 main [2554] Wait 2 secs for phy reset
290.396252 main [2556] Ready...
290.396493 sender_body [1175] start, fd 3 main_fd 3
290.396728 sender_body [1290] flush tail 1023 head 500 on thread 0x7fbcac635700
291.397816 main_thread [2019] 499.000 pps (500.000 pkts 240.000 Kbps in 1001330 usec) 500.00 avg_batch 0 min_space
Sent 500 packets 30000 bytes 1 events 60 bytes each in 0.01 seconds.
Speed: 44.647 Kpps Bandwidth: 21.430 Mbps (raw 30.003 Mbps). Average batch: 500.00 pkts

but your skb protocol pointers are wrong:

[56358.771850] protocol 0000 is buggy, dev eth0

output from my patch

$ sudo ./pkt-gen -i eth0 -f tx -d 192.168.1.2:80 -s 192.168.137.75:1023 -n 500
449.421605 main [2234] interface is eth0
449.421771 main [2354] running on 1 cpus (have 1)
449.421981 extract_ip_range [364] range is 192.168.137.75:1023 to 192.168.137.75:1023
449.422042 extract_ip_range [364] range is 192.168.1.2:80 to 192.168.1.2:80
449.474254 main [2455] mapped 334980KB at 0x7f5a09a6b000
Sending on netmap:eth0: 1 queues, 1 threads and 1 cpus.
192.168.137.75 -> 192.168.1.2 (00:00:00:00:00:00 -> ff:ff:ff:ff:ff:ff)
449.474456 main [2552] Sending 512 packets every  0.000000000 s
449.474488 main [2554] Wait 2 secs for phy reset
451.474677 main [2556] Ready...
451.475123 sender_body [1175] start, fd 3 main_fd 3
451.475923 sender_body [1290] flush tail 1023 head 500 on thread 0x7f5a09a6a700
452.476590 main_thread [2019] 499.000 pps (500.000 pkts 240.000 Kbps in 1001483 usec) 500.00 avg_batch 0 min_space
Sent 500 packets 30000 bytes 1 events 60 bytes each in 0.00 seconds.
Speed: 232.883 Kpps Bandwidth: 111.784 Mbps (raw 156.497 Mbps). Average batch: 500.00 pkts

It does maintain the protocol pointers correctly, which is a theoretical speed cost, but since all those methods are inline static, the compile may well turn it into ~nothing... certainly way less than the basic variance between runs - I've seen your patch as low as 32Kpps and as high as 740Kpps within this VM with all the same parameters, and mine likewise.

rbtcollins commented 8 years ago

I've tested https://github.com/luigirizzo/netmap/pull/182 too now, once I set the src and dst MAC I can see its traffic externally.

vmaffione commented 8 years ago

Hi, I've seen #182 , but I'm trying to have a less redundant patch. As you see, skb_push, skb_trim, skb_reserve, etc, it's too much for a simple work and very hard to read. Also, incrementing the length by LL_RESERVED_SPACE is not optimal, because you are counting the space for Ethernet header twice.

Moreover, consider that the generic driver may be used with very fast hardware. In my testbed I've ixgbe cards, with which I get up to 3.9 Mpps with the generic adapter.

But you are right, we miss the call to skb_reset_network_header. The other ones are not needed, as I see in dev_queue_xmit_nit(). Updated the patch with your suggestions, can you please give a try?

headroom.patch.txt

rbtcollins commented 8 years ago

Hmm, LL_RESERVED_SPACE doesn't allow for ethernet header twice? Or do you mean that because netmap allows for the ethernet header, that results in use accounting for it twice? If so yes, that macro makes more sense for L3 protocols.

That said, your new patch looks like it will avoid the protocol errors while maintaining the desired headroom - I'll give it a spin later to be sure, but broadly +1 from me :)

rbtcollins commented 8 years ago

Seems ok :)

vmaffione commented 8 years ago

Exactly, a netmap buffer contains an Ethernet frame, so it has already space for the L2 header (i.e. 'len' already takes into account this). As a result, it does not make sense to account for it again through LL_RESERVED_SPACE. We just need to add headroom and tailroom.

Ok, so if your tests are ok I will merge the patch. Thanks for all your testing and tweaking! :)

Btw, it would be nice to have a native adapter for the hyperv paravirtualized NIC, similarly to the adapter we have for virtio-net. Even better, it would be nice to have hyperv support for netmap passthrough networking, similar to what we have for linux (LINUX/ptnet/ptnet.c) and soon for FreeBSD.

luigirizzo / netmap

Kernel panic running pkt-gen on Linux 4.4 with generic drivers #177