iqiyi / dpvs

DPVS is a high performance Layer-4 load balancer based on DPDK.
Other
3k stars 723 forks source link

测试dpvs1.9.6,发现部分请求没有将真实用户IP写入是为什么呢 #942

Closed elegx closed 2 months ago

elegx commented 6 months ago

部署的dpvs1.9.6,RS上也加载了toa模块,发现会有少量的请求拿不到真实用户IP,使用pdump在LB上抓包,发现没有写入toa,日志开启了debug,但是没有看到对应toa的信息,要怎么处理呢

ywc689 commented 6 months ago

LB上抓不到转发的数据包,需要在RS上抓包。

elegx commented 6 months ago

fullnat模式,抓的LB发给RS的syn包,这样抓不对吗? @ywc689

ywc689 commented 6 months ago

抱歉,上面看错了。pdump可以抓到lb的包。 v1.9.6把syn包中的toa信息去除了,toa信息封装在连接的第一个ack包里,你可以抓ack包看看。

elegx commented 6 months ago

1、改成在RS上抓包了,部分请求的ack里面没有toa信息,就拿不到真实用户IP image

2、能获取到真实IP的请求,ack包是有toa的 image 请问要怎么进一步排查呢 @ywc689

ywc689 commented 6 months ago

有“TOA add failed”这种错误日志吗?没有toa包所含的tcp选项有哪些?

elegx commented 6 months ago

失败的时间没有看到有TOA相关的错误日志,ack包里面也没有其他tcp option信息,syn包是这样的 image @ywc689

ywc689 commented 6 months ago

v1.9.6 TOA添加失败是会打印日志的,如果没有日志,可能没执行toa添加操作。提供下没有toa的连接的第一个ack包的抓包数据吧。

elegx commented 6 months ago

Frame 32855: 56 bytes on wire (448 bits), 56 bytes captured (448 bits) Encapsulation type: Linux cooked-mode capture v1 (25) Arrival Time: Mar 13, 2024 20:08:29.047723000 中国标准时间 UTC Arrival Time: Mar 13, 2024 12:08:29.047723000 UTC Epoch Arrival Time: 1710331709.047723000 [Time shift for this packet: 0.000000000 seconds] [Time delta from previous captured frame: 0.000356000 seconds] [Time delta from previous displayed frame: 0.047121000 seconds] [Time since reference or first frame: 9.869123000 seconds] Frame Number: 32855 Frame Length: 56 bytes (448 bits) Capture Length: 56 bytes (448 bits) [Frame is marked: False] [Frame is ignored: False] [Protocols in frame: sll:ethertype:ip:tcp] [Coloring Rule Name: HTTP] [Coloring Rule String: http || tcp.port == 80 || http2] Linux cooked capture v1 Packet type: Unicast to us (0) Link-layer address type: Ethernet (1) Link-layer address length: 6 Source: 0a:0a:b5:d3:c5:1a (0a:0a:b5:d3:c5:1a) Unused: 1de4 Protocol: IPv4 (0x0800) Internet Protocol Version 4, Src: 10.5.16.111, Dst: 10.60.196.75 0100 .... = Version: 4 .... 0101 = Header Length: 20 bytes (5) Differentiated Services Field: 0x04 (DSCP: LE, ECN: Not-ECT) 0000 01.. = Differentiated Services Codepoint: Lower Effort (1) .... ..00 = Explicit Congestion Notification: Not ECN-Capable Transport (0) Total Length: 40 Identification: 0x0000 (0)

  1. .... = Flags: 0x2, Don't fragment 0... .... = Reserved bit: Not set .1.. .... = Don't fragment: Set ..0. .... = More fragments: Not set ...0 0000 0000 0000 = Fragment Offset: 0 Time to Live: 42 Protocol: TCP (6) Header Checksum: 0x67d1 [validation disabled] [Header checksum status: Unverified] Source Address: 10.5.16.111 Destination Address: 10.60.196.75 Transmission Control Protocol, Src Port: 36217, Dst Port: 80, Seq: 1, Ack: 1, Len: 0 Source Port: 36217 Destination Port: 80 [Stream index: 3645] [Conversation completeness: Complete, WITH_DATA (47)] ..1. .... = RST: Present ...0 .... = FIN: Absent .... 1... = Data: Present .... .1.. = ACK: Present .... ..1. = SYN-ACK: Present .... ...1 = SYN: Present [Completeness Flags: R·DASS] [TCP Segment Len: 0] Sequence Number: 1 (relative sequence number) Sequence Number (raw): 2555902871 [Next Sequence Number: 1 (relative sequence number)] Acknowledgment Number: 1 (relative ack number) Acknowledgment number (raw): 1877499506 0101 .... = Header Length: 20 bytes (5) Flags: 0x010 (ACK)
    1. .... .... = Reserved: Not set ...0 .... .... = Accurate ECN: Not set .... 0... .... = Congestion Window Reduced: Not set .... .0.. .... = ECN-Echo: Not set .... ..0. .... = Urgent: Not set .... ...1 .... = Acknowledgment: Set .... .... 0... = Push: Not set .... .... .0.. = Reset: Not set .... .... ..0. = Syn: Not set .... .... ...0 = Fin: Not set [TCP Flags: ·······A····] Window: 65535 [Calculated window size: 2097120] [Window size scaling factor: 32] Checksum: 0xd6c5 [unverified] [Checksum Status: Unverified] Urgent Pointer: 0 [Timestamps] [Time since first frame in this TCP stream: 1.241187000 seconds] [Time since previous frame in this TCP stream: 0.047121000 seconds] [SEQ/ACK analysis] [This is an ACK to the segment in frame: 28596] [The RTT to ACK the segment was: 1.241175000 seconds] [iRTT: 0.047134000 seconds] @ywc689
ywc689 commented 6 months ago

10.5.16.111 这个地址是fullnat用的local ip吗?

elegx commented 6 months ago

是的

elegx commented 6 months ago

请问可以看出是哪块有问题呢 @ywc689

ywc689 commented 6 months ago

没看出来是什么问题。我这边测试验证也没有发现用户IP传不过去的情况。你那边出现概率高吗?如果方便,可以提供一个完整的配置和pcap抓包文件。