iqiyi / dpvs

DPVS is a high performance Layer-4 load balancer based on DPDK.
Other
3.03k stars 728 forks source link

dpvs启动到最后系统重启 #60

Closed SchoIsles closed 4 years ago

SchoIsles commented 6 years ago

大侠,

编译完成后启动dpvs,眼看着要起来了,最终系统重启了,journal系统日志无异常

kernel 3.10.0-327 + dpdk 16.07.2 nic: intel 82599 ixgbe

40c

配置文件默认,只去掉了dpdk1相关内容 完整的dpvs启动日志如下,请帮我看看吧

EAL: Detected 40 lcore(s)
EAL: No free hugepages reported in hugepages-1048576kB
EAL: Probing VFIO support...
PMD: bnxt_rte_pmd_init() called for (null)
EAL: PCI device 0000:04:00.0 on NUMA socket 0
EAL:   probe driver: 8086:154d rte_ixgbe_pmd
EAL: PCI device 0000:04:00.1 on NUMA socket 0
EAL:   probe driver: 8086:154d rte_ixgbe_pmd
CFG_FILE: Opening configuration file '/etc/dpvs.conf'.
CFG_FILE: log_level = DEBUG
NETIF: pktpool_size = 524287 (round to 2^n-1)
NETIF: pktpool_cache_size = 256 (round to 2^n)
NETIF: netif device config: dpdk0
NETIF: dpdk0:rx_queue_number = 8
NETIF: dpdk0:nb_rx_desc = 1024 (round to 2^n)
NETIF: dpdk0:rss = tcp
NETIF: dpdk0:tx_queue_number = 8
NETIF: dpdk0:nb_tx_desc = 1024 (round to 2^n)
NETIF: dpdk0: kni_name = dpdk0.kni
NETIF: netif worker config: cpu0
NETIF: cpu0:type = master
NETIF: cpu0:cpu_id = 0
NETIF: netif worker config: cpu1
NETIF: cpu1:type = slave
NETIF: cpu1:cpu_id = 1
NETIF: worker cpu1:dpdk0 queue config
NETIF: worker cpu1:dpdk0 rx_queue_id += 0
NETIF: worker cpu1:dpdk0 tx_queue_id += 0
NETIF: netif worker config: cpu2
NETIF: cpu2:type = slave
NETIF: cpu2:cpu_id = 2
NETIF: worker cpu2:dpdk0 queue config
NETIF: worker cpu2:dpdk0 rx_queue_id += 1
NETIF: worker cpu2:dpdk0 tx_queue_id += 1
NETIF: netif worker config: cpu3
NETIF: cpu3:type = slave
NETIF: cpu3:cpu_id = 3
NETIF: worker cpu3:dpdk0 queue config
NETIF: worker cpu3:dpdk0 rx_queue_id += 2
NETIF: worker cpu3:dpdk0 tx_queue_id += 2
NETIF: netif worker config: cpu4
NETIF: cpu4:type = slave
NETIF: cpu4:cpu_id = 4
NETIF: worker cpu4:dpdk0 queue config
NETIF: worker cpu4:dpdk0 rx_queue_id += 3
NETIF: worker cpu4:dpdk0 tx_queue_id += 3
NETIF: netif worker config: cpu5
NETIF: cpu5:type = slave
NETIF: cpu5:cpu_id = 5
NETIF: worker cpu5:dpdk0 queue config
NETIF: worker cpu5:dpdk0 rx_queue_id += 4
NETIF: worker cpu5:dpdk0 tx_queue_id += 4
NETIF: netif worker config: cpu6
NETIF: cpu6:type = slave
NETIF: cpu6:cpu_id = 6
NETIF: worker cpu6:dpdk0 queue config
NETIF: worker cpu6:dpdk0 rx_queue_id += 5
NETIF: worker cpu6:dpdk0 tx_queue_id += 5
NETIF: netif worker config: cpu7
NETIF: cpu7:type = slave
NETIF: cpu7:cpu_id = 7
NETIF: worker cpu7:dpdk0 queue config
NETIF: worker cpu7:dpdk0 rx_queue_id += 6
NETIF: worker cpu7:dpdk0 tx_queue_id += 6
NETIF: netif worker config: cpu8
NETIF: cpu8:type = slave
NETIF: cpu8:cpu_id = 8
NETIF: worker cpu8:dpdk0 queue config
NETIF: worker cpu8:dpdk0 rx_queue_id += 7
NETIF: worker cpu8:dpdk0 tx_queue_id += 7
DTIMER: sched_interval = 500
NEIGHBOUR: arp_unres_qlen = 128
NEIGHBOUR: arp_pktpool_size = 1023(round to 2^n-1)
NEIGHBOUR: arp_pktpool_cache = 32(round to 2^n)
NEIGHBOUR: arp_timeout = 60
IPV4: inet_def_ttl = 64
IP4FRAG: ip4_frag_buckets = 4096
IP4FRAG: ip4_frag_bucket_entries = 16 (round to 2^n)
IP4FRAG: ip4_frag_max_entries = 4096
IP4FRAG: ip4_frag_ttl = 1
MSGMGR: msg_ring_size = 4096 (round to 2^n)
MSGMGR: msg_mc_qlen = 256 (round to 2^n)
MSGMGR: sync_msg_timeout_us = 2000
MSGMGR: ipc_unix_domain = /var/run/dpvs_ctrl
IPVS: conn_pool_size = 2097152 (round to 2^n)
IPVS: conn_pool_cache = 256 (round to 2^n)
IPVS: conn_init_timeout = 3
IPVS: defence_udp_drop ON
IPVS: udp_timeout_normal = 300
IPVS: udp_timeout_last = 3
IPVS: defence_tcp_drop ON
IPVS: tcp_timeout_none = 2
IPVS: tcp_timeout_established = 90
IPVS: tcp_timeout_syn_sent = 3
IPVS: tcp_timeout_syn_recv = 30
IPVS: tcp_timeout_fin_wait = 7
IPVS: tcp_timeout_time_wait = 7
IPVS: tcp_timeout_close = 3
IPVS: tcp_timeout_close_wait = 7
IPVS: tcp_timeout_last_ack = 7
IPVS: tcp_timeout_listen = 120
IPVS: tcp_timeout_synack = 30
IPVS: tcp_timeout_last = 2
IPVS: synack_mss = 1452
IPVS: synack_ttl = 63
IPVS: synproxy_synack_options_sack ON
IPVS: rs_syn_max_retry = 3
IPVS: ack_storm_thresh = 10
IPVS: max_ack_saved = 3
IPVS: synproxy_conn_reuse ON
IPVS: synproxy_conn_reuse: CLOSE
IPVS: synproxy_conn_reuse: TIMEWAIT
KNI: pci: 04:00:01       8086:154d
MSGMGR: [msg_init] built-in msg registered:
lcore 0     hash 1         type 1         mode UNICAST       unicast_cb 0x44e2a0    multicast_cb (nil)
lcore 0     hash 2         type 2         mode UNICAST       unicast_cb 0x44e4b0    multicast_cb (nil)
lcore 1     hash 1         type 1         mode UNICAST       unicast_cb 0x44e2a0    multicast_cb (nil)
lcore 1     hash 2         type 2         mode UNICAST       unicast_cb 0x44e4b0    multicast_cb (nil)
lcore 1     hash 5         type 5         mode UNICAST       unicast_cb 0x4359d0    multicast_cb (nil)
lcore 2     hash 1         type 1         mode UNICAST       unicast_cb 0x44e2a0    multicast_cb (nil)
lcore 2     hash 2         type 2         mode UNICAST       unicast_cb 0x44e4b0    multicast_cb (nil)
lcore 2     hash 5         type 5         mode UNICAST       unicast_cb 0x4359d0    multicast_cb (nil)
lcore 3     hash 1         type 1         mode UNICAST       unicast_cb 0x44e2a0    multicast_cb (nil)
lcore 3     hash 2         type 2         mode UNICAST       unicast_cb 0x44e4b0    multicast_cb (nil)
lcore 3     hash 5         type 5         mode UNICAST       unicast_cb 0x4359d0    multicast_cb (nil)
lcore 4     hash 1         type 1         mode UNICAST       unicast_cb 0x44e2a0    multicast_cb (nil)
lcore 4     hash 2         type 2         mode UNICAST       unicast_cb 0x44e4b0    multicast_cb (nil)
lcore 4     hash 5         type 5         mode UNICAST       unicast_cb 0x4359d0    multicast_cb (nil)
lcore 5     hash 1         type 1         mode UNICAST       unicast_cb 0x44e2a0    multicast_cb (nil)
lcore 5     hash 2         type 2         mode UNICAST       unicast_cb 0x44e4b0    multicast_cb (nil)
lcore 5     hash 5         type 5         mode UNICAST       unicast_cb 0x4359d0    multicast_cb (nil)
lcore 6     hash 1         type 1         mode UNICAST       unicast_cb 0x44e2a0    multicast_cb (nil)
lcore 6     hash 2         type 2         mode UNICAST       unicast_cb 0x44e4b0    multicast_cb (nil)
lcore 6     hash 5         type 5         mode UNICAST       unicast_cb 0x4359d0    multicast_cb (nil)
lcore 7     hash 1         type 1         mode UNICAST       unicast_cb 0x44e2a0    multicast_cb (nil)
lcore 7     hash 2         type 2         mode UNICAST       unicast_cb 0x44e4b0    multicast_cb (nil)
lcore 7     hash 5         type 5         mode UNICAST       unicast_cb 0x4359d0    multicast_cb (nil)
lcore 8     hash 1         type 1         mode UNICAST       unicast_cb 0x44e2a0    multicast_cb (nil)
lcore 8     hash 2         type 2         mode UNICAST       unicast_cb 0x44e4b0    multicast_cb (nil)
lcore 8     hash 5         type 5         mode UNICAST       unicast_cb 0x4359d0    multicast_cb (nil)

USER1: rte_ip_frag_table_create: allocated of 25165952 bytes at socket 0
USER1: rte_ip_frag_table_create: allocated of 25165952 bytes at socket 1
USER1: rte_ip_frag_table_create: allocated of 25165952 bytes at socket 0
USER1: rte_ip_frag_table_create: allocated of 25165952 bytes at socket 1
USER1: rte_ip_frag_table_create: allocated of 25165952 bytes at socket 0
USER1: rte_ip_frag_table_create: allocated of 25165952 bytes at socket 1
USER1: rte_ip_frag_table_create: allocated of 25165952 bytes at socket 0
USER1: rte_ip_frag_table_create: allocated of 25165952 bytes at socket 1
USER1: rte_ip_frag_table_create: allocated of 25165952 bytes at socket 0
USER1: rte_ip_frag_table_create: allocated of 25165952 bytes at socket 1
USER1: rte_ip_frag_table_create: allocated of 25165952 bytes at socket 0
USER1: rte_ip_frag_table_create: allocated of 25165952 bytes at socket 1
USER1: rte_ip_frag_table_create: allocated of 25165952 bytes at socket 0
USER1: rte_ip_frag_table_create: allocated of 25165952 bytes at socket 1
USER1: rte_ip_frag_table_create: allocated of 25165952 bytes at socket 0
USER1: rte_ip_frag_table_create: allocated of 25165952 bytes at socket 1
USER1: rte_ip_frag_table_create: allocated of 25165952 bytes at socket 0
USER1: rte_ip_frag_table_create: allocated of 25165952 bytes at socket 1
USER1: rte_ip_frag_table_create: allocated of 25165952 bytes at socket 0
USER1: rte_ip_frag_table_create: allocated of 25165952 bytes at socket 1
USER1: rte_ip_frag_table_create: allocated of 25165952 bytes at socket 0
USER1: rte_ip_frag_table_create: allocated of 25165952 bytes at socket 1
USER1: rte_ip_frag_table_create: allocated of 25165952 bytes at socket 0
USER1: rte_ip_frag_table_create: allocated of 25165952 bytes at socket 1
USER1: rte_ip_frag_table_create: allocated of 25165952 bytes at socket 0
USER1: rte_ip_frag_table_create: allocated of 25165952 bytes at socket 1
USER1: rte_ip_frag_table_create: allocated of 25165952 bytes at socket 0
USER1: rte_ip_frag_table_create: allocated of 25165952 bytes at socket 1
USER1: rte_ip_frag_table_create: allocated of 25165952 bytes at socket 0
USER1: rte_ip_frag_table_create: allocated of 25165952 bytes at socket 1
USER1: rte_ip_frag_table_create: allocated of 25165952 bytes at socket 0
USER1: rte_ip_frag_table_create: allocated of 25165952 bytes at socket 1
USER1: rte_ip_frag_table_create: allocated of 25165952 bytes at socket 0
USER1: rte_ip_frag_table_create: allocated of 25165952 bytes at socket 1
USER1: rte_ip_frag_table_create: allocated of 25165952 bytes at socket 0
USER1: rte_ip_frag_table_create: allocated of 25165952 bytes at socket 1
USER1: rte_ip_frag_table_create: allocated of 25165952 bytes at socket 0
USER1: rte_ip_frag_table_create: allocated of 25165952 bytes at socket 1
USER1: rte_ip_frag_table_create: allocated of 25165952 bytes at socket 0
USER1: rte_ip_frag_table_create: allocated of 25165952 bytes at socket 1
NETIF: dpdk0:dst_port_mask=700
NETIF: device dpdk0 configuration:
RSS: ETH_RSS_TCP
ipv4_src_ip:        0
ipv4_dst_ip: 0xffffffff
src_port:    0
dst_port: 0x700

NETIF: Waiting for dpdk0 link up, be patient ...
NETIF: >> dpdk0: link up - speed 10000 Mbps - full-duplex
DPVS: 
port-queue-lcore relation array: 
                dpdk0: A0:36:9F:E0:76:72 
    rx0-tx0     cpu1-cpu1                
    rx1-tx1     cpu2-cpu2                
    rx2-tx2     cpu3-cpu3                
    rx3-tx3     cpu4-cpu4                
    rx4-tx4     cpu5-cpu5                
    rx5-tx5     cpu6-cpu6                
    rx6-tx6     cpu7-cpu7                
    rx7-tx7     cpu8-cpu8                

NETIF: [netif_loop] Lcore 9 has nothing to do.
NETIF: [netif_loop] Lcore 11 has nothing to do.
NETIF: [netif_loop] Lcore 12 has nothing to do.
NETIF: [netif_loop] Lcore 13 has nothing to do.
NETIF: [netif_loop] Lcore 15 has nothing to do.
NETIF: [netif_loop] Lcore 10 has nothing to do.
NETIF: [netif_loop] Lcore 14 has nothing to do.
NETIF: [netif_loop] Lcore 16 has nothing to do.
NETIF: [netif_loop] Lcore 17 has nothing to do.
NETIF: [netif_loop] Lcore 18 has nothing to do.
NETIF: [netif_loop] Lcore 19 has nothing to do.
NETIF: [netif_loop] Lcore 29 has nothing to do.
NETIF: [netif_loop] Lcore 22 has nothing to do.
NETIF: [netif_loop] Lcore 23 has nothing to do.
NETIF: [netif_loop] Lcore 39 has nothing to do.
NETIF: [netif_loop] Lcore 26 has nothing to do.
NETIF: [netif_loop] Lcore 28 has nothing to do.
NETIF: [netif_loop] Lcore 30 has nothing to do.
NETIF: [netif_loop] Lcore 32 has nothing to do.
NETIF: [netif_loop] Lcore 33 has nothing to do.
NETIF: [netif_loop] Lcore 35 has nothing to do.
NETIF: [netif_loop] Lcore 24 has nothing to do.
NETIF: [netif_loop] Lcore 38 has nothing to do.
NETIF: [netif_loop] Lcore 27 has nothing to do.
NETIF: [netif_loop] Lcore 31 has nothing to do.
NETIF: [netif_loop] Lcore 34 has nothing to do.
NETIF: [netif_loop] Lcore 37 has nothing to do.
NETIF: [netif_loop] Lcore 21 has nothing to do.
NETIF: [netif_loop] Lcore 36 has nothing to do.
NETIF: [netif_loop] Lcore 20 has nothing to do.
NETIF: [netif_loop] Lcore 25 has nothing to do.
Kni: kni_mc_list_cmp_set: add mc addr: 01:00:5e:00:00:01 dpdk0 OK
Kni: kni_mc_list_cmp_set: add mc addr: 33:33:00:00:00:01 dpdk0 OK
Kni: kni_mc_list_cmp_set: add mc addr: 33:33:ff:e0:76:72 dpdk0 OK
beacer commented 6 years ago

No clue from the log , is there anything useful from /var/log/dmesg ? or dmesg

SchoIsles commented 6 years ago

直接就reboot了,看了系统日志前后完全没有提示,很苦恼,看来你们也没碰到过吧

SchoIsles commented 6 years ago

@beacer 另外,在当时放弃测试后想上线应用,发现用作dpdk的网卡不能正常回包了,表现为主动发往其网关的icmp包能正常返回,同一网段其它主机发给它的包没有返回,抓包看到包确实已经到达此网卡,却没有返回,iptables规则没问题,全清了也一样,uio模块在重启后没有去手动加载,网卡驱动也重置为ixgbe了。

最终无奈重装了系统才恢复,感觉好像编译过程改变了系统内核层的处理,照理不应该呀

Shadowu410 commented 6 years ago

我也遇到类似情况,给板载的intel 82599网卡加载igb_uio驱动并且启动dpvs时,会出现整个系统hang住了的情况,屏幕键盘都没有反应,只能重启。但是用安装在PCIE槽的网卡运行则没有问题。可以考虑换个网卡试试?

icymoon commented 6 years ago

请问这个问题后来解决了吗?谢谢

beacer commented 6 years ago

DPDK对网卡的兼容要求确实比较高,因为我们测试和线上环境都是X540网卡,暂时没有其他类型的网卡,也没法复现这样的问题。只好说看看日志或者猜测。另外,也可以去DPDK那边问问,是不是82599有类似的问题。

beacer commented 6 years ago

@okletswin any update ? Or could you pls try other NIC model ?

SchoIsles commented 6 years ago

@beacer 然后没再测了,我们这边好像都是82599的,回头我也试试X540

beacer commented 6 years ago

Ok.

ywc689 commented 4 years ago

这个问题一般是dpdk使用的内核模块(比如rte_kni.ko, igb_uio.ko)的问题,确保它们和使用的DPDK版本匹配,同时不要在正在使用的时候卸载或删除它们。