sonic-net / sonic-mgmt

Configuration management examples for SONiC
Other
201 stars 727 forks source link

ARP received in the wrong interface in Arp_responder.py #421

Closed macauleycheng closed 6 years ago

macauleycheng commented 6 years ago

Description

I can't understand when the packets received by PTF docker. Which interface will the PTF docker receive these packet? How to find out the relationship?

Describe the results you received: In the fast-reload test case, the arp responder check the request IP and received Intf. They don't match so that ARP responder won't reply ARP

Describe the results you expected: The arp request message shall be received by correct interface.

pavel-shirshov commented 6 years ago

Hi,

Can you please explain your question little bit more? What packets do you mean? We send packets in two directions from T1 to hosts (from port channels to vlan), and from the hosts to T1 (from vlan to the port channels). Which one is in your question? To find relation ship between the docker ptf interfaces and the DUT interface you may open the topology file https://github.com/Azure/sonic-mgmt/blob/master/ansible/vars/topo_t0.yml The interface numbers under host_interfaces are belongs to Vlan2 interface as member ports. The interface numbers unders VMs are belongs to PortChannel interfaces of DUT. Please let me know if you need further explanations.

ARP responder doesn't reply on every arp request. The vlan interface sends the arp request through all its member ports to find a port with the requested ip address. The ARPResponder responds only from one interface.

So it's impossible to send just one arp request, because we don't know the correct vlan member interface when we send arp request.

macauleycheng commented 6 years ago

In Fast_reboot test, it will use test Server to DUT and DUT to Server to make sure DUT stable. But at first table test, the DUT to Server direction always received little TCP packets. 29 TCP packets with different SIP but only 3 TCP packet was received. We find the root cause is at the Arp_responder.py. Below 1 is my Arp_responder.py debug message and 2 is the Intf/IP mapping table dump by Arp_responder.py. From 1, we noticed that ARP request IP always received by wrong intf. This direction is from DUT to Server. From your reply, the DUT will test all interfaces to find out a interface with correct IP binding. But from the debug message, I found it don’t try all interfaces. I can’t make sure this is the DUT problem or the PTF Docker problem.

  1. Arp Debugger message

macauley rx remote_ip 192.168.0.1 request_ip 192.168.0.9

macauley drop intf eth24, remote_ip 192.168.0.1 request_ip 192.168.0.9

macauley rx remote_ip 192.168.0.1 request_ip 192.168.0.9

macauley drop intf eth23, remote_ip 192.168.0.1 request_ip 192.168.0.9

macauley rx remote_ip 192.168.0.1 request_ip 192.168.0.9

macauley drop intf eth19, remote_ip 192.168.0.1 request_ip 192.168.0.9

macauley rx remote_ip 192.168.0.1 request_ip 192.168.0.9

macauley drop intf eth18, remote_ip 192.168.0.1 request_ip 192.168.0.9

macauley rx remote_ip 192.168.0.1 request_ip 192.168.0.9

macauley drop intf eth17, remote_ip 192.168.0.1 request_ip 192.168.0.9

macauley rx remote_ip 192.168.0.1 request_ip 192.168.0.9

macauley drop intf eth16, remote_ip 192.168.0.1 request_ip 192.168.0.9

macauley rx remote_ip 192.168.0.1 request_ip 192.168.0.2

macauley drop intf eth24, remote_ip 192.168.0.1 request_ip 192.168.0.2

macauley rx remote_ip 192.168.0.1 request_ip 192.168.0.2

macauley drop intf eth23, remote_ip 192.168.0.1 request_ip 192.168.0.2

macauley rx remote_ip 192.168.0.1 request_ip 192.168.0.2

macauley drop intf eth19, remote_ip 192.168.0.1 request_ip 192.168.0.2

macauley rx remote_ip 192.168.0.1 request_ip 192.168.0.2

macauley drop intf eth18, remote_ip 192.168.0.1 request_ip 192.168.0.2

macauley rx remote_ip 192.168.0.1 request_ip 192.168.0.2

macauley drop intf eth17, remote_ip 192.168.0.1 request_ip 192.168.0.2

macauley rx remote_ip 192.168.0.1 request_ip 192.168.0.2

macauley drop intf eth16, remote_ip 192.168.0.1 request_ip 192.168.0.2

macauley rx remote_ip 192.168.0.1 request_ip 192.168.0.3

macauley drop intf eth24, remote_ip 192.168.0.1 request_ip 192.168.0.3

macauley rx remote_ip 192.168.0.1 request_ip 192.168.0.5

macauley drop intf eth24, remote_ip 192.168.0.1 request_ip 192.168.0.5

macauley rx remote_ip 192.168.0.1 request_ip 192.168.0.4

macauley drop intf eth24, remote_ip 192.168.0.1 request_ip 192.168.0.4

macauley rx remote_ip 192.168.0.1 request_ip 192.168.0.6

macauley drop intf eth24, remote_ip 192.168.0.1 request_ip 192.168.0.6

macauley rx remote_ip 192.168.0.1 request_ip 192.168.0.11

macauley drop intf eth24, remote_ip 192.168.0.1 request_ip 192.168.0.11

macauley rx remote_ip 192.168.0.1 request_ip 192.168.0.11

macauley drop intf eth19, remote_ip 192.168.0.1 request_ip 192.168.0.11

macauley rx remote_ip 192.168.0.1 request_ip 192.168.0.12

macauley drop intf eth16, remote_ip 192.168.0.1 request_ip 192.168.0.12

macauley rx remote_ip 192.168.0.1 request_ip 192.168.0.13

macauley drop intf eth24, remote_ip 192.168.0.1 request_ip 192.168.0.13

macauley rx remote_ip 192.168.0.1 request_ip 192.168.0.19

macauley drop intf eth24, remote_ip 192.168.0.1 request_ip 192.168.0.19

macauley rx remote_ip 192.168.0.1 request_ip 192.168.0.19

macauley drop intf eth19, remote_ip 192.168.0.1 request_ip 192.168.0.19

macauley rx remote_ip 192.168.0.1 request_ip 192.168.0.24

macauley drop intf eth24, remote_ip 192.168.0.1 request_ip 192.168.0.24

macauley rx remote_ip 192.168.0.1 request_ip 192.168.0.24

macauley drop intf eth19, remote_ip 192.168.0.1 request_ip 192.168.0.24

macauley rx remote_ip 192.168.0.1 request_ip 192.168.0.25

macauley drop intf eth24, remote_ip 192.168.0.1 request_ip 192.168.0.25

macauley rx remote_ip 192.168.0.1 request_ip 192.168.0.25

macauley drop intf eth19, remote_ip 192.168.0.1 request_ip 192.168.0.25

macauley rx remote_ip 192.168.0.1 request_ip 192.168.0.29

macauley drop intf eth10, remote_ip 192.168.0.1 request_ip 192.168.0.29

macauley rx remote_ip 192.168.0.1 request_ip 192.168.0.30

macauley drop intf eth24, remote_ip 192.168.0.1 request_ip 192.168.0.30

macauley rx remote_ip 192.168.0.1 request_ip 192.168.0.9

macauley drop intf eth24, remote_ip 192.168.0.1 request_ip 192.168.0.9

macauley rx remote_ip 192.168.0.1 request_ip 192.168.0.9

macauley drop intf eth23, remote_ip 192.168.0.1 request_ip 192.168.0.9

macauley rx remote_ip 192.168.0.1 request_ip 192.168.0.9

macauley drop intf eth19, remote_ip 192.168.0.1 request_ip 192.168.0.9

macauley rx remote_ip 192.168.0.1 request_ip 192.168.0.9

macauley drop intf eth18, remote_ip 192.168.0.1 request_ip 192.168.0.9

macauley rx remote_ip 192.168.0.1 request_ip 192.168.0.9

macauley drop intf eth17, remote_ip 192.168.0.1 request_ip 192.168.0.9

macauley rx remote_ip 192.168.0.1 request_ip 192.168.0.9

macauley drop intf eth16, remote_ip 192.168.0.1 request_ip 192.168.0.9

macauley rx remote_ip 192.168.0.1 request_ip 192.168.0.2

macauley drop intf eth24, remote_ip 192.168.0.1 request_ip 192.168.0.2

  1. from_t1.json {"eth9": {"192.168.0.8": "720600010006"}, "eth8": {"192.168.0.7": "720600010005"}, "eth7": {"192.168.0.30": "72060001001c", "192.168.0.6": "720600010004"}, "eth6": {"192.168.0.29": "72060001001b", "192.168.0.5": "720600010003"}, "eth5": {"192.168.0.28": "72060001001a", "192.168.0.4": "720600010002"}, "eth4": {"192.168.0.3": "720600010001", "192.168.0.27": "720600010019"}, "eth3": {"192.168.0.2": "720600010000", "192.168.0.26": "720600010018"}, "eth2": {"192.168.0.25": "720600010017"}, "eth1": {"192.168.0.24": "720600010016"}, "eth24": {"192.168.0.23": "720600010015"}, "eth22": {"192.168.0.21": "720600010013"}, "eth23": {"192.168.0.22": "720600010014"}, "eth20": {"192.168.0.19": "720600010011"}, "eth21": {"192.168.0.20": "720600010012"}, "eth19": {"192.168.0.18": "720600010010"}, "eth18": {"192.168.0.17": "72060001000f"}, "eth13": {"192.168.0.12": "72060001000a"}, "eth12": {"192.168.0.11": "720600010009"}, "eth11": {"192.168.0.10": "720600010008"}, "eth10": {"192.168.0.9": "720600010007"}, "eth17": {"192.168.0.16": "72060001000e"}, "eth16": {"192.168.0.15": "72060001000d"}, "eth15": {"192.168.0.14": "72060001000c"}, "eth14": {"192.168.0.13": "72060001000b"}}

From: pavel-shirshov [mailto:notifications@github.com] Sent: Friday, January 12, 2018 1:16 AM To: Azure/sonic-mgmt sonic-mgmt@noreply.github.com Cc: macauley_cheng 鄭振昌 macauley_cheng@accton.com; Author author@noreply.github.com Subject: Re: [Azure/sonic-mgmt] ARP received in the wrong interface in Arp_responder.py (#421)

Hi,

Can you please explain your question little bit more? What packets do you mean? We send packets in two directions from T1 to hosts (from port channels to vlan), and from the hosts to T1 (from vlan to the port channels). Which one is in your question? To find relation ship between the docker ptf interfaces and the DUT interface you may open the topology file https://github.com/Azure/sonic-mgmt/blob/master/ansible/vars/topo_t0.yml The interface numbers under host_interfaces are belongs to Vlan2 interface as member ports. The interface numbers unders VMs are belongs to PortChannel interfaces of DUT. Please let me know if you need further explanations.

ARP responder doesn't reply on every arp request. The vlan interface sends the arp request through all its member ports to find a port with the requested ip address. The ARPResponder responds only from one interface.

So it's impossible to send just one arp request, because we don't know the correct vlan member interface when we send arp request.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHubhttps://github.com/Azure/sonic-mgmt/issues/421#issuecomment-356997254, or mute the threadhttps://github.com/notifications/unsubscribe-auth/ACWOY6tomdZCDy6P48sAvhiUkL3yM_xpks5tJkHigaJpZM4RaOfM.

pavel-shirshov commented 6 years ago

ok. When you have vlan interface you can have multiple physical ports connected to one logical vlan interface. For example you have Vlan1000 interface with ip 10.0.0.1/24. But Vlan members interface for Vlan1000 could be Ethernet0, Ethernet4, Ethernet16 and so on. When DUT wants to find on what Vlan physical interface a customer resides the DUT sends ARP request to all physical interfaces which are part of Vlan1000. And then the customer which has requested ip address responds to the ARP request. So our set up emulate this. The ARP responder responds on the ARP request only through one interface. Mapping between IP/MAC - interfaces are defined in from_t1.json file.

macauleycheng commented 6 years ago

In my environment, the DUT always only use ARP to ask 4 interface to get MAC. Is this DUT problem? This test case only use 4 lag interface to do this test. Could I change the Arp_responder.py to help correct the interface when Arp request IP in the wrong interface by send Arp reply in the correct interface?

From: pavel-shirshov [mailto:notifications@github.com] Sent: Thursday, January 18, 2018 5:37 AM To: Azure/sonic-mgmt sonic-mgmt@noreply.github.com Cc: macauley_cheng 鄭振昌 macauley_cheng@accton.com; Author author@noreply.github.com Subject: Re: [Azure/sonic-mgmt] ARP received in the wrong interface in Arp_responder.py (#421)

ok. When you have vlan interface you can have multiple physical ports connected to one logical vlan interface. For example you have Vlan1000 interface with ip 10.0.0.1/24. But Vlan members interface for Vlan1000 could be Ethernet0, Ethernet4, Ethernet16 and so on. When DUT wants to find on what Vlan physical interface a customer resides the DUT sends ARP request to all physical interfaces which are part of Vlan1000. And then the customer which has requested ip address responds to the ARP request. So our set up emulate this. The ARP responder responds on the ARP request only through one interface. Mapping between IP/MAC - interfaces are defined in from_t1.json file.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHubhttps://github.com/Azure/sonic-mgmt/issues/421#issuecomment-358454957, or mute the threadhttps://github.com/notifications/unsubscribe-auth/ACWOY9auu-UFio73mZSlsFt4j8wemR0gks5tLmflgaJpZM4RaOfM.

pavel-shirshov commented 6 years ago

How many interfaces you have as vlan members on your DUT? Can you please show an output of "show vlan brief" command? How do you distinct the wrong interface from correct interface in your case?

macauleycheng commented 6 years ago

How many interfaces you have as vlan members on your DUT?

24

Can you please show an output of "show vlan brief" command?

root@str-dut-01:/home/admin# show vlan brief Command: sudo brctl show bridge name bridge id STP enabled interfaces Bridge 8000.7072cf9c1ac2 no Ethernet12 Ethernet16 Ethernet20 Ethernet24 Ethernet28 Ethernet32 Ethernet36 Ethernet4 Ethernet40 Ethernet44 Ethernet48 Ethernet52 Ethernet56 Ethernet60 Ethernet64 Ethernet68 Ethernet72 Ethernet76 Ethernet8 Ethernet80 Ethernet84 Ethernet88 Ethernet92 Ethernet96 docker0 8000.0242ffe0e80f no

root@str-dut-01:/home/admin#

How do you distinct the wrong interface from correct interface in your case? At first I dump the ARP table, I found only some IP get the ARP reply. Then I dump the received interface in arp_reponder.py to check why it won’t reply ARP request. I found that APR request always come in from the interface which it don’t have this IP assigned. And I also found there are only 4 interface will receive the ARP request. Please see my previous mail debug log.

Command: /usr/sbin/arp -n Address HWtype HWaddress Flags Mask Iface 10.250.0.244 ether 0c:c4:7a:68:6f:71 C eth0 10.250.0.1 (incomplete) eth0

root@str-dut-01:/home/admin#

從 Windows 10 的郵件https://go.microsoft.com/fwlink/?LinkId=550986傳送

寄件者: pavel-shirshovmailto:notifications@github.com 傳送時間: 2018年1月27日 上午 09:33 收件者: Azure/sonic-mgmtmailto:sonic-mgmt@noreply.github.com 副本: macauley_cheng 鄭振昌mailto:macauley_cheng@accton.com; Authormailto:author@noreply.github.com 主旨: Re: [Azure/sonic-mgmt] ARP received in the wrong interface in Arp_responder.py (#421)

How many interfaces you have as vlan members on your DUT? Can you please show an output of "show vlan brief" command? How do you distinct the wrong interface from correct interface in your case?

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHubhttps://github.com/Azure/sonic-mgmt/issues/421#issuecomment-360948796, or mute the threadhttps://github.com/notifications/unsubscribe-auth/ACWOY8iQots5Na-iHPZQoJHknIjn5tEUks5tOnyjgaJpZM4RaOfM.

pavel-shirshov commented 6 years ago

Got it. Thank you for the response. Are you sure that you have correct mapping between your DUT ports and ptf ports? For example that your DUT Ethernet0 is mapped to eth0 in the ptf container? How the whole thing works: fast-reboot.py test sends a generated packet from self.from_t1. See how the packet is generated https://github.com/Azure/sonic-mgmt/blob/master/ansible/roles/test/files/ptftests/fast-reboot.py#L527 The packet goes to one of the portchannel ports of the DUT. DUT has to route the packet to one of vlan ports. Until fdb entry is not populated (and arp entry is not populated) DUT sends an ARP requests to ptf_container. The arp responder in the ptf container receives the arp requests. The DUT will send the arp requests to all vlan member ports, but arp responder will respond only from one port. When DUT receives the respond it save entry to ARP and FDB tables.

pavel-shirshov commented 6 years ago

So answering on your initial question: arp responder receives multiple arp requests. One arp requests per member interface of the Vlan. But arp responder responds only one request. You can find interface and mac address of the request from from_t1.json file. Use ip address from arp request as a key and interface and mac address will be the data.

pavel-shirshov commented 6 years ago

If you receive not all arp requests and arp responder don't generate an arp response so some arp requests were lost by python. In this case you need to increase socket rcv buffers. Try to add following code in your arp responder: After this https://github.com/Azure/sonic-mgmt/blob/master/ansible/roles/test/files/helpers/arp_responder.py#L44 self.socket.setsockopt(socket.SOL_SOCKET, socket.SO_RCVBUF, 10000000)

Also you need to increase the max value of the net.core.rmem_max in your host system where your ptf docker works sysctl net.core.rmem_max=10000000 Try it, probably it'll help you.

macauleycheng commented 6 years ago

Thanks, we pass the test case