luigirizzo / netmap

Automatically exported from code.google.com/p/netmap
BSD 2-Clause "Simplified" License
1.84k stars 533 forks source link

kernel crash on Centos 7.3 (IOMMU) #403

Open bvital1976 opened 6 years ago

bvital1976 commented 6 years ago

I am using Centos 7.3 VM running inside ESXi, which is itself running in VMWare Workstation. Centos 7.3 with kernel 3.10.0-693.5.2.el7.x86_64 crashes on starting pkt-gen when using e1000 virtual network card. If I try e1000e virtual network card then pkt-gen works fine. e1000 source code correctly fixed as described in https://github.com/luigirizzo/netmap#348. Before crash Linux displays message "DMA: Out of SW-IOMMU space for 4096 bytes at device 0000:02:00.0". Unpatched e1000 driver works fine. The crash message is:

[  136.994482] Oops: 0000 [#1] SMP
[  136.995291] Modules linked in: igb i2c_algo_bit dca e1000(OE) netmap(OE) nfnetlink_queue nfnetlink_log nfnetlink bluetooth rfkill bridge 8021q garp mrp stp llc iosf_mbi crc32_pclmul ghash_clmulni_intel aesni_intel ppdev lrw gf128mul glue_helper ablk_helper cryptd sg vmw_balloon nfit joydev pcspkr libnvdimm vmw_vmci i2c_piix4 shpchp parport_pc parport ptp pps_core ip_tables ext4 mbcache jbd2 ata_generic pata_acpi sd_mod crc_t10dif crct10dif_generic vmwgfx drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ttm ata_piix drm libata crct10dif_pclmul crct10dif_common crc32c_intel serio_raw mptspi scsi_transport_spi mptscsih mptbase i2c_core floppy [last unloaded: netmap]
[  136.999780] CPU: 0 PID: 3012 Comm: nsagent Tainted: G           OE  ------------   3.10.0-693.5.2.el7.x86_64 #1
[  137.000958] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 04/05/2016
[  137.002273] task: ffff880081215ee0 ti: ffff88009dedc000 task.ti: ffff88009dedc000
[  137.003164] RIP: 0010:[<ffffffffc02919c5>]  [<ffffffffc02919c5>] e1000_configure+0x395/0x670 [e1000]
[  137.004580] RSP: 0018:ffff88009dedfb70  EFLAGS: 00010283
[  137.005202] RAX: ffff880081250100 RBX: ffff88009e6d6800 RCX: 0000000000000000
[  137.005919] RDX: 0000000000000002 RSI: 0000000000000000 RDI: 0000000000000100
[  137.006527] RBP: ffff88009dedfbb8 R08: 0000000000000000 R09: 0000000000000100
[  137.007247] R10: 00000000000000ff R11: ffffea0004e0a200 R12: ffff8800811b3700
[  137.007851] R13: 0000000000000000 R14: ffff8801371428c0 R15: ffff8800811b3800
[  137.008468] FS:  00007fc796908880(0000) GS:ffff88013fc00000(0000) knlGS:0000000000000000
[  137.009254] CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[  137.009817] CR2: 0000000000000010 CR3: 0000000080762000 CR4: 00000000000407f0
[  137.010650] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  137.011278] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[  137.012113] Stack:
[  137.012618]  ffff88009dedfb90 ffffffffc028c755 0000000000000001 00000000e530d493
[  137.013263]  ffff8801371428c0 ffff880137142000 ffff88009e6d6800 ffff8801371428c0
[  137.013773]  0000000000000001 ffff88009dedfbd0 ffffffffc0291e02 ffff880137143078
[  137.014273] Call Trace:
[  137.014900]  [<ffffffffc028c755>] ? e1000_clean_all_rx_rings+0x45/0x60 [e1000]
[  137.015388]  [<ffffffffc0291e02>] e1000_up+0x12/0x80 [e1000]
[  137.015865]  [<ffffffffc02922f5>] e1000_netmap_reg+0x135/0x1d0 [e1000]
[  137.016627]  [<ffffffffc02cd81e>] netmap_hw_reg+0x7e/0x90 [netmap]
[  137.017155]  [<ffffffffc02d07f6>] netmap_do_regif+0x2c6/0x400 [netmap]
[  137.017764]  [<ffffffffc02d1b48>] ? netmap_get_na+0x1e8/0x230 [netmap]
[  137.018230]  [<ffffffffc02d23c9>] netmap_ioctl+0x709/0xa30 [netmap]
[  137.018685]  [<ffffffff812d3ec3>] ? ima_file_check+0x63/0x1b0
[  137.019219]  [<ffffffff81222854>] ? mntput+0x24/0x40
[  137.019883]  [<ffffffff8120c489>] ? terminate_walk+0x49/0x50
[  137.020313]  [<ffffffff8120fc8d>] ? do_last+0x66d/0x12c0
[  137.020891]  [<ffffffffc02d39eb>] linux_netmap_ioctl+0xab/0x140 [netmap]
[  137.021299]  [<ffffffff812109a2>] ? path_openat+0xc2/0x490
[  137.021691]  [<ffffffff81212f3b>] ? do_filp_open+0x4b/0xb0
[  137.022143]  [<ffffffff812151bd>] do_vfs_ioctl+0x33d/0x540
[  137.022685]  [<ffffffff812202a5>] ? __fd_install+0x25/0x60
[  137.023251]  [<ffffffff81215461>] SyS_ioctl+0xa1/0xc0
[  137.023879]  [<ffffffff816b5089>] system_call_fastpath+0x16/0x1b
[  137.024310] Code: 2c 8b 7e 28 0f 88 dc 00 00 00 89 d6 29 fe 39 d7 0f 4e d6 89 d2 48 8b b3 10 01 00 00 48 c1 e2 04 8b 14 10 3b 93 18 01 00 00 73 ab <48> 8b 34 d6 eb a8 0f 1f 44 00 00 41 8b 96 8c 03 00 00 85 d2 7e
[  137.025918] RIP  [<ffffffffc02919c5>] e1000_configure+0x395/0x670 [e1000]
[  137.026575]  RSP <ffff88009dedfb70>
[  137.027271] CR2: 0000000000000010
vmaffione commented 6 years ago

Have you tried to disable IOMMU?

bvital1976 commented 6 years ago

Centos 7.3 system becomes unusable with disabled IOMMU.

bvital1976 commented 6 years ago

The same crash occurs on Centos 7.3 with kernel 3.10.0-693.11.1.el7.x86_64 running on VMWare Workstation.

bvital1976 commented 6 years ago

The last commit which does not crash the OS is:

Commit: b2e99d984f890f74fa1a9cbf57ff43121f575a19 [b2e99d9] Parents: 191aab0174 Author: Giuseppe Lettieri g.lettieri@iet.unipi.it Date: 15 марта 2017 г. 18:45:15 Committer: Giuseppe Lettieri Commit Date: 5 октября 2017 г. 17:12:49

all followed commits crash Centos 7.3 on VMWare while using e1000

vmaffione commented 6 years ago

So you are saying this commit is the cause:

commit 546c4e63c9a8c32fd64f6799ebeff5e993d8c6db
Author: Giuseppe Lettieri <giuseppe.lettieri@unipi.it>
Date:   Fri Oct 6 13:50:16 2017 +0200

    linux: fix broken dma-mapping

? Maybe @giuseppelettieri has a clue on this, it should be related to netmap_load_map.

What pkt-gen command line are you using exactly?

bvital1976 commented 6 years ago

Yes, this commit is the first which crashes OS.

I used the following command: "pkt-gen -i eth2 -f tx"

giuseppelettieri commented 6 years ago

The scheme we are using involves a pre-allocation of all the DMA-mappings, which is infeasible for SW-IOMMU. The crash is probably on the error path after the allocation failure. The bug needs to be fixed, of course, but you will still be unable to use the patched e1000 driver in your setup. Anyway, using the unpatched driver with the generic netmap-layer should make no difference for e1000.

bvital1976 commented 6 years ago

Patched e1000 driver does not work in ESXi, VMWare Workstation and VirtualBox VMs. Using unpatched driver with pkt-gen gives me ~70Kpps, while patched driver gives ~500Kpps in my environment. I think this is big difference in performance.

vmaffione commented 6 years ago

It makes a big difference because you are running inside a vm, so the e1000 nic is emulated in software, and i/o register access is very expensive. Since with the generic driver you pay a register access per packet this is very costly (with patched driver you pay one register access per batch).

@giuseppelettieri meant that on a real e1000 card you wouldn't see significant difference, because i/o register access is way cheaper on a real pci bus.

It seems that e1000 isn't a good solution for you. Have you tried to use virtio-net, or maybe vmxnet? Vmware is not open source, so we don't have good solutions for that (while we have good solutions for qemu and bhyve).

bvital1976 commented 6 years ago

virtio-net is not available for VMWare. vmxnet - do you mean I have to patch vmxnet sources to allow netmap using it? non-patched vmxnet gives me below 40Kpps

vmaffione commented 6 years ago

I'm surprised that VMWare doesn't support virtio (which is the de facto standard for VM networking). Yes, I think the only solution for your use-case would be to patch vmxnet sources to allow netmap to use it. I don't see other solutions because your virtualization environment is very constrained. Using netmap on the unpatched vmxnet vNIC has the very same problems as e1000 (costly per-packet I/O register access), so I'm not surprised you get 40Kpps.

bvital1976 commented 6 years ago

It crashes in any virtual environment I tried. E.g. it crashes on VirtualBox on Centos 7.4.1708, with kernel 3.10.0-693.11.1.el7.x86_64 latest netmap master with the following stack trace: [ 113.925253] [] ? e1000_clean_all_rx_rings+0x45/0x60 [e1000] [ 113.925757] [] e1000_up+0x12/0x80 [e1000] [ 113.926337] [] e1000_netmap_reg+0x135/0x1d0 [e1000] [ 113.926831] [] netmap_hw_reg+0x7e/0x90 [netmap] [ 113.927316] [] netmap_do_regif+0x2c6/0x400 [netmap] [ 113.927881] [] ? netmap_get_na+0x1e8/0x230 [netmap] [ 113.928350] [] netmap_ioctl+0x709/0xa30 [netmap] [ 113.928822] [] ? radix_tree_lookup_slot+0x22/0x50 [ 113.929325] [] ? find_get_page+0x1e/0xa0 [ 113.929776] [] ? filemap_fault+0x215/0x410 [ 113.930217] [] linux_netmap_ioctl+0xab/0x140 [netmap] [ 113.930768] [] ? unlock_page+0x2b/0x30 [ 113.931225] [] ? do_read_fault.isra.44+0xe6/0x130 [ 113.931673] [] ? handle_mm_fault+0x691/0xfa0 [ 113.932158] [] ? do_filp_open+0x4b/0xb0 [ 113.932654] [] do_vfs_ioctl+0x33d/0x540 [ 113.933265] [] ? do_page_fault+0x171/0x450 [ 113.933719] [] SyS_ioctl+0xa1/0xc0 [ 113.934088] [] system_call_fastpath+0x16/0x1b

vmaffione commented 6 years ago

Yes, as @giuseppelettieri is saying, the crash is probably on the error path, and it's happening because you are using sw iommu. We need to fix it, but that won't let you use the system. At least virtio-net is supported by virtualbox or qemu: have you tried to use that instead of e1000?

bvital1976 commented 6 years ago

No, because my main target is ESXi. So the only choice for me is patching vmxnet3.

giuseppelettieri commented 6 years ago

Hi @bvital1976, here is some other option for your use case.

Be aware that SW-IOMMU means that the kernel is allocating a bounce buffer for each one of the pre-allocated netmap buffers. The allocation fails and you get the error.

However, you may not need that many extra buffers in your case. Try to change them with something like

echo 5000 > /sys/module/netmap/parameters/buf_num

(try to find a value that make your application start successfully.)

Please note that bounce buffers do not work currently, since we are not doing the the necessary copies. They will work after we merge #411 and then apply the change also to the e1000 driver.

bvital1976 commented 6 years ago

Your suggestion helps. Now the system does not crash, pkt-gen works but my application does not work most of the time. Host rings and TX ring usually work, RX rings rarely work. Is this because bounce buffers are not complete?

giuseppelettieri commented 6 years ago

It depends on the kind of error. Are you receving all-null packets? If yes, that is because of the missing copy from the bounce buffers.

bvital1976 commented 6 years ago

Yes, I receive all-null packets with 2048 bytes size

giuseppelettieri commented 6 years ago

May you please try the current master?

bvital1976 commented 6 years ago

It works with "echo 24576 > /sys/module/netmap/parameters/buf_num" on VMWare Workstation It freezes Centos with different buf_num on ESXi using e1000. e1000e works fine on ESXi.

vmaffione commented 6 years ago

We have done several fixes in this area, so the bug should be gone. Could you please give a try to confirm that?

bvital1976 commented 6 years ago

The bug is still there. I tried Centos on ESXi. Sometimes pkt-gen works, sometimes it hangs the system, sometimes it does not work because the program cannot allocate memory. I tried different buf_num.

vmaffione commented 6 years ago

It is ok that pkt-gen returns ENOMEM in case there are not enough bounce buffers. That's not a bug. The hang however is worrisome... does it happen with specific values of buf_num? Is it deterministic or not?

bvital1976 commented 6 years ago

It hangs with different buf_num values. It does not hang always. Sometimes it works correctly, sometimes it tries to send packets but cannot send packets (reports 0 sent packets), sometimes it hangs. The VM in case of hang uses 100% of CPUs

vmaffione commented 6 years ago

ok, and you don't have an update kernel stack trace of the hang, if any? If you share the exact steps to reproduce the issue we could try to reproduce it.

bvital1976 commented 6 years ago

no, I do not have a kernel stack trace. I had to reset ESXi host since it did not respond to commands. I tried on two different ESXi hosts.

To reproduce the issue:

giuseppelettieri commented 6 years ago

Hi @bvital1976 , may please try again with the latest master?

bvital1976 commented 6 years ago

It still hangs. It hangs from the first attempt with "-l 1500". It does not hang without "-l" switch.