Closed 316953425 closed 6 years ago
看是是没人能解决了
FDIR for multi-core is widely used on our production environment, like FullNAT and SNAT modes, and it seems stable. Most likely, there is some miss configuration to cause this issue. It seems you have same IP for RS and client, can you try a different one ? We'll check it if have time. However, we really have limited resources on development, and have high priority features and bugs to work on first. So it's hard to debug and support each issues reported in time. And any contribution is well come.
对于客户端和服务器不能是同一台机器这种限制,如果存在必然不合理吧
如果方便的话,能否告知一下你们在实际环境中使用的网卡型号?
local addr加多点吧,如果实现跟阿里的fullnat一样,local addr跟队列是多对一关系,local addr取模队列数量来分配
我看他的代码是先给每个cpu分配一定数量的lip port,而cpu与队列又有直接的对应关系,然后在通过 struct rte_eth_fdir_filter filt[MAX_FDIR_PROTO] = { { .input.flow_type = RTE_ETH_FLOW_NONFRAG_IPV4_TCP, .input.flow.tcp4_flow.ip.dst_ip = dip, .input.flow.tcp4_flow.dst_port = dport,
.action.behavior = RTE_ETH_FDIR_ACCEPT,
.action.report_status = RTE_ETH_FDIR_REPORT_ID,
.soft_id = filter_id[0],
},
{
.input.flow_type = RTE_ETH_FLOW_NONFRAG_IPV4_UDP,
.input.flow.udp4_flow.ip.dst_ip = dip,
.input.flow.udp4_flow.dst_port = dport,
.action.behavior = RTE_ETH_FDIR_ACCEPT,
.action.report_status = RTE_ETH_FDIR_REPORT_ID,
.soft_id = filter_id[1],
},
};
结构体设置fdir,也就是队列与lip port的对应关系,
但是我比较疑惑的是这样的话,其实只设置了一个port啊,而lip port与cpu 的对应关系是多对于1的
以一个lip 两个队列 两个cpu为例:
cpu1 --- queue 1 ---- port(1026 1028 1030......) cpu2 --- queue 2 ---- port(1025 1027 1029......)
而在设置的时候上述结构体中input.flow.udp4_flow.dst_port 的值分别为0,1又没有设置掩码,这事我比较疑惑的地方 @beacer
因为内网网段划分的关系,内网IP资源有限,所以没有采用阿里的用许多LIP设置fdir,而是用<lip,lport/mask>作为fdir的filter @lvsgate。虽然最终LIP数目和并发有关,也不能用的太少。
@316953425 ,目前的逻辑是选取lport的N bits设置fdir,其中2^N > lcore数目,也就是为每个core分配mask不同的lport段。掩码部分的设置在netif_port_fdir_dstport_mask_set
。需要几位掩码根据cpu core数目决定。
我们local address不会用二层的,交换机直接路由一个C的网段到接口的ip上。 local address几十个都是不够用的,踩过坑,冲突很多。
@lvsgate 问题就是分配内网的team没有分配一个C那么多的IP给一台机器当LIP,很多机器共享了,经常出现不够用。这个不是我们能左右的,只能说在未来的规划中注意。 另外使用lip,还是lport,前者软件逻辑会简单不少,但是IP多、除了IP资源告急外,相应运维成本也略高,涉及的资源分配校验保持不冲突等,如果自动化做的好还好说,做的不好就很多人肉工作。 后者用Lport增加了软件的复杂度,但并发不大很大的情况,不需要配置和管理那么多的LIP。
@beacer 这就奇怪了,那应该没有问题啊~为什么我的session会被分到不通的cpu上呢
@beacer 了解,二层网络确实有这个分配问题,三层网络就好多了
@316953425 I've verified you config, two core/queue, and same IP for client/RS. But fail to reproduce your issue.
Our NIC is Ethernet controller: Intel Corporation Ethernet Controller 10-Gigabit X540-AT2 (rev 01)
VIP=192.168.100.100
LIP=192.168.100.200
RS=192.168.100.2
./dpip addr add ${VIP}/24 dev dpdk0
./ipvsadm -A -t ${VIP}:80 -s rr
./ipvsadm -a -t ${VIP}:80 -r ${RS} -b
./ipvsadm --add-laddr -z ${LIP} -t 192.168.100.100:80 -F dpdk0
client output
root # curl 192.168.100.100
Your ip:port : 192.168.100.2:58723
root # curl 192.168.100.100
Your ip:port : 192.168.100.2:58725
root # curl 192.168.100.100
Your ip:port : 192.168.100.2:58727
root # curl 192.168.100.100
Your ip:port : 192.168.100.2:58729
debug output shows packets in same conn reach same lcore
IPVS: new conn: [2] TCP 192.168.100.2:58751 192.168.100.100:80 192.168.100.200:1063 192.168.100.2:80 refs 2
IPVS: conn lookup: [2] TCP 192.168.100.2:80 -> 192.168.100.200:1063 hit
IPVS: conn lookup: [2] TCP 192.168.100.2:58751 -> 192.168.100.100:80 hit
IPVS: conn lookup: [2] TCP 192.168.100.2:58751 -> 192.168.100.100:80 hit
IPVS: conn lookup: [2] TCP 192.168.100.2:80 -> 192.168.100.200:1063 hit
IPVS: conn lookup: [2] TCP 192.168.100.2:80 -> 192.168.100.200:1063 hit
IPVS: conn lookup: [2] TCP 192.168.100.2:58751 -> 192.168.100.100:80 hit
IPVS: conn lookup: [2] TCP 192.168.100.2:58751 -> 192.168.100.100:80 hit
IPVS: conn lookup: [2] TCP 192.168.100.2:80 -> 192.168.100.200:1063 hit
IPVS: conn lookup: [2] TCP 192.168.100.2:58751 -> 192.168.100.100:80 hit
IPVS: conn lookup: [1] TCP 192.168.100.2:58753 -> 192.168.100.100:80 miss
IPVS: new conn: [1] TCP 192.168.100.2:58753 192.168.100.100:80 192.168.100.200:1082 192.168.100.2:80 refs 2
IPVS: conn lookup: [1] TCP 192.168.100.2:80 -> 192.168.100.200:1082 hit
IPVS: conn lookup: [1] TCP 192.168.100.2:58753 -> 192.168.100.100:80 hit
IPVS: conn lookup: [1] TCP 192.168.100.2:58753 -> 192.168.100.100:80 hit
IPVS: conn lookup: [1] TCP 192.168.100.2:80 -> 192.168.100.200:1082 hit
IPVS: conn lookup: [1] TCP 192.168.100.2:80 -> 192.168.100.200:1082 hit
IPVS: conn lookup: [1] TCP 192.168.100.2:58753 -> 192.168.100.100:80 hit
IPVS: conn lookup: [1] TCP 192.168.100.2:58753 -> 192.168.100.100:80 hit
IPVS: conn lookup: [1] TCP 192.168.100.2:80 -> 192.168.100.200:1082 hit
IPVS: conn lookup: [1] TCP 192.168.100.2:58753 -> 192.168.100.100:80 hit
IPVS: del conn: [2] TCP 192.168.100.2:58739 192.168.100.100:80 192.168.100.200:1053 192.168.100.2:80 refs 0
IPVS: del conn: [2] TCP 192.168.100.2:58741 192.168.100.100:80 192.168.100.200:1055 192.168.100.2:80 refs 0
My cpu/queue config for your refer,
netif_defs {
!<init> pktpool_size 524287
<init> pktpool_size 250000
<init> pktpool_cache 256
<init> device dpdk0 {
rx {
queue_number 2
descriptor_number 1024
rss tcp
}
tx {
queue_number 2
descriptor_number 1024
}
! promisc_mode
kni_name dpdk0.kni
}
}
worker_defs { [96/8014]
<init> worker cpu0 {
type master
cpu_id 0
}
<init> worker cpu1 {
type slave
cpu_id 1
port dpdk0 {
rx_queue_ids 0
tx_queue_ids 0
! isol_rx_cpu_ids 9
! isol_rxq_ring_sz 1048576
}
}
<init> worker cpu2 {
type slave
cpu_id 2
port dpdk0 {
rx_queue_ids 1
tx_queue_ids 1
! isol_rx_cpu_ids 10
! isol_rxq_ring_sz 1048576
}
}
}
@beacer 感谢您的帮助,确实是一样的 只不过我的配置文件比你了些如下内容:
! timer config timer_defs {
schedule_interval 500
}
! dpvs neighbor config neigh_defs {
@316953425 我们用内核版的驱动i40e,x710是不支持fdir mask的,不知道dpdk的怎么样。
@316953425 I didn't paste all lines in dpvs.conf
, the remain part is irrelevant for FDIR, should not affect the result.
As @lvsgate mentioned, if x710 support FDIR, but not fdir-mask, FNAT/SNAT won't work, if possible, pls try other NICs like X540 we used. Or pls verify if both fdir-mask and rules works on X710.
@lvsgate @beacer 多谢~换完以后有结果了,我会在这里及时告知大家,多谢二位
@beacer @lvsgate 多谢,确实网卡问题,ok了
我的配置如下
我仅仅使用了一个网卡(X710)
我dpvs的配置文件为dpvs.conf.single-nic.sample 我发现一个奇怪的现象,只用我当的网卡队列数设置为1的时候,是可以curl成功的
通过调试代码发现,当队列数为2的时候,curl 服务(客户端IP:10.112.95.3),
拓扑 客户端 10.112.95.3 vip 10.114.249.201 lip 10.114.249.202 服务器 10.112.95.3
log 有如下输出:
lcore 2 port0 ipv4 hl 5 tos 0 tot 60 id 35328 ttl 60 prot 6 src 10.112.95.3 dst 10.114.249.201 lcore 1 port0 ipv4 hl 5 tos 0 tot 52 id 0 ttl 60 prot 6 src 10.112.95.3 dst 10.114.249.202 lcore 2 port0 ipv4 hl 5 tos 0 tot 60 id 35329 ttl 60 prot 6 src 10.112.95.3 dst 10.114.249.201 lcore 1 port0 ipv4 hl 5 tos 0 tot 52 id 0 ttl 60 prot 6 src 10.112.95.3 dst 10.114.249.202 lcore 1 port0 ipv4 hl 5 tos 0 tot 52 id 0 ttl 60 prot 6 src 10.112.95.3 dst 10.114.249.202 lcore 2 port0 ipv4 hl 5 tos 0 tot 60 id 35330 ttl 60 prot 6 src 10.112.95.3 dst 10.114.249.201 lcore 1 port0 ipv4 hl 5 tos 0 tot 52 id 0 ttl 60 prot 6 src 10.112.95.3 dst 10.114.249.202 lcore 1 port0 ipv4 hl 5 tos 0 tot 52 id 0 ttl 60 prot 6 src 10.112.95.3 dst 10.114.249.202 lcore 2 port0 ipv4 hl 5 tos 0 tot 60 id 35331 ttl 60 prot 6 src 10.112.95.3 dst 10.114.249.201 lcore 1 port0 ipv4 hl 5 tos 0 tot 52 id 0 ttl 60 prot 6 src 10.112.95.3 dst 10.114.249.202 lcore 1 port0 ipv4 hl 5 tos 0 tot 52 id 0 ttl 60 prot 6 src 10.112.95.3 dst 10.114.249.202 lcore 2 port0 ipv4 hl 5 tos 0 tot 60 id 35332 ttl 60 prot 6 src 10.112.95.3 dst 10.114.249.201 lcore 1 port0 ipv4 hl 5 tos 0 tot 52 id 0 ttl 60 prot 6 src 10.112.95.3 dst 10.114.249.202