luigirizzo / netmap

Automatically exported from code.google.com/p/netmap
BSD 2-Clause "Simplified" License
1.86k stars 537 forks source link

problem in using ixgbe #873

Open mohammadkhanjani opened 2 years ago

mohammadkhanjani commented 2 years ago

hi I used ixgbe 5.15.2 for netmap ext-driver. when I run:

numactl -C 0-3 pkt-gen -f rx -c 4 -p 4 -i eth8

(eth8 is ixgbe interface) in dmesg there is a message, that says:

call Trace: netmap_poll+0x562/0x5d0 [netmap] ? ktime_get_ts64+0x49/0xxf0 linux_netmap_poll+0x3f/0x60 [netmap] do_sys_poll . . .

why? can you help me? some times it stock and can not be closed.

when I run this command in a script:

i=0
while [ $i -lt 30 ]
do
    numactl -C 0-3 pkt-gen -f rx -c 4 -p 4 -i eth8 &
    sleep 6
    killall -9 pkt-gen
    sleep 3
    ((i++))
done

when I run this script after some iterations it is stock and I should reboot the server for this reason.

i run this script with previouse version of netmap for about 1 or 2 year ago. there is not any problem for working with ixgbe.please help to fix this problem

jhk098 commented 2 years ago

The dmesg call trace is incomplete and does not show what the problem is (e.g. NULL pointer reference, deadlock, etc.)

mohammadkhanjani commented 2 years ago

Hi thanks for checking my problem. please help me. it's an emergency and a very big bug. I check the older netmap with the ixgbe driver 5.5.3 and it was ok. after that I check again with the newest netmap with ixgbe 5.5.3 and 5.15.2 but the problem still exists.

complete dmesg:

[500932.091311] watchdog: BUG: soft lockup - CPU#4 stuck for 22s! [C eth4 #0:10872] [500932.091356] Modules linked in: ixgbe(OE) i40e(OE) auxiliary(OE) netmap(OE) dccp_diag dccp tcp_diag udp_diag inet_diag unix_diag af_packet_diag netlink_diag rpcsec_gss_krb5 veth iptable_f ilter ip_tables x_tables bridge stp llc ipmi_ssif intel_rapl sb_edac nls_iso8859_1 x86_pkg_temp_thermal intel_powerclamp hpilo coretemp kvm irqbypass intel_cstate intel_rapl_perf shpchp ioat dma lpc_ich ipmi_si ipmi_devintf ipmi_msghandler acpi_power_meter mac_hid nfsd auth_rpcgss nfs_acl lockd grace sunrpc autofs4 algif_skcipher af_alg dm_crypt raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c raid1 raid0 multipath linear crct10dif_pclmul crc32_pclmul ghash_clmulni_intel pcbc aesni_intel aes_x86_64 crypto_simd glue_he lper cryptd mgag200 i2c_algo_bit ttm drm_kms_helper
[500932.091401] syscopyarea sysfillrect sysimgblt fb_sys_fops ahci drm libahci tg3 dca hpsa ptp pps_core scsi_transport_sas wmi [last unloaded: auxiliary] [500932.091411] CPU: 4 PID: 10872 Comm: C eth4 #0 Tainted: G OE 4.15.25-47 #1
[500932.091425] RIP: 0010:ixgbe_netmap_rxsync+0x182/0x440 [ixgbe] [500932.091426] RSP: 0018:ffffb013cdf9f988 EFLAGS: 00000246 ORIG_RAX: ffffffffffffff11 [500932.091427] RAX: 00000000aa5c7e91 RBX: 0000000000000e91 RCX: ffff9d9cffb6ea10 [500932.091428] RDX: 0000000000000564 RSI: 0000000000000000 RDI: 0000002819226c00 [500932.091429] RBP: ffffb013cdf9f9e8 R08: 0000000000000fff R09: 0000000000000e91 [500932.091429] R10: ffffb013ebe8d000 R11: 0000000000000000 R12: 0000000000000002 [500932.091430] R13: ffff9d847e0a8a50 R14: 0000000000000e91 R15: ffff9d9cffb60000 [500932.091431] FS: 00007fff6d810700(0000) GS:ffff9d9ebf900000(0000) knlGS:0000000000000000 [500932.091432] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [500932.091433] CR2: 00007ffff712c760 CR3: 0000003f6c18d003 CR4: 00000000003606e0 [500932.091434] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [500932.091434] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [500932.091435] Call Trace: [500932.091448] netmap_poll+0x562/0x5d0 [netmap] [500932.091452] ? _slab_alloc+0x27d/0x4e0 [500932.091457] linux_netmap_poll+0x3f/0x60 [netmap] [500932.091460] do_sys_poll+0x364/0x600 [500932.091464] ? notify_change+0x390/0x410 [500932.091468] ? _cond_resched+0x1a/0x50 [500932.091470] ? poll_initwait+0x50/0x50 [500932.091471] ? compat_poll_select_copy_remaining+0x140/0x140 [500932.091473] ? compat_poll_select_copy_remaining+0x140/0x140 [500932.091476] ? seccomp_run_filters+0x57/0xc0 [500932.091479] ? proc_destroy_inode+0x1c/0x20 [500932.091480] ? destroy_inode+0x3b/0x60 [500932.091481] ? evict+0x136/0x1a0 [500932.091483] ? seccomp_filter+0x49/0x540 [500932.091484] ? dentry_free+0x38/0x70 [500932.091486] ? dentry_kill+0x118/0x170 [500932.091487] ? _cond_resched+0x1a/0x50 [500932.091488] ? dput+0x34/0x1f0 [500932.091490] ? mntput+0x24/0x40 [500932.091492] ? fput+0x190/0x220 [500932.091494] ? ktime_get_ts64+0x49/0xf0 [500932.091496] SyS_poll+0x71/0x130 [500932.091497] ? SyS_poll+0x71/0x130 [500932.091500] do_syscall_64+0x6d/0x120 [500932.091501] entry_SYSCALL_64_after_hwframe+0x3d/0xa2 [500932.091503] RIP: 0033:0x7ffff24d684d [500932.091503] RSP: 002b:00007fff6d39a020 EFLAGS: 00000293 ORIG_RAX: 0000000000000007 [500932.091504] RAX: ffffffffffffffda RBX: 0000000008083de0 RCX: 00007ffff24d684d [500932.091505] RDX: 0000000000000064 RSI: 0000000000000001 RDI: 0000000008083e30 [500932.091505] RBP: 0000000008083e30 R08: 0000000020000000 R09: 0000000008501180 [500932.091506] R10: 00000000000003df R11: 0000000000000293 R12: 00007fff6d39a0a0 [500932.091507] R13: 00007fffffffe4ff R14: 00007fff6d8109c0 R15: 0000000000000000 [500932.091507] Code: 01 00 00 49 8b 7d 70 8b 31 3b b7 30 01 00 00 4c 8b 97 28 01 00 00 0f 83 07 02 00 00 49 8b 3c f2 48 8b 71 08 49 23 b5 58 01 00 00 <49> 8b 8d 60 01 00 00 48 39 ce 48 0f 4 6 ce 48 89 ce 48 01 fe 0f

if it stock, server should be reboot.