telstra / open-kilda

OpenKilda is an open-source OpenFlow controller initially designed for use in a global network with high control-plane latency and a heavy emphasis on latency-centric data path optimisation.
Apache License 2.0
77 stars 53 forks source link

Kilda AIO deployment: traffgen 'tg88' of switch 'ofsw8' is not working correctly for LLDP, ARP requests #5663

Closed izadorozhna closed 1 month ago

izadorozhna commented 1 month ago

Steps to reproduce:

  1. Deploy virtual AIO Kilda with basic default settings.
  2. Please pay attention that the topology.yaml has 2 traffgens for ofsw8:
    - name: tg8
    iface: eth0
    control_endpoint: http://0.0.0.0:4081
    switch: ofsw8
    switch_port: 9
    status: active
    - name: tg88
    iface: eth0
    control_endpoint: http://0.0.0.0:4082
    switch: ofsw8
    switch_port: 10
    status: active
  3. Verify that both traffgens of switch 8 are active and working fine for LLDP, ARP requests, you can use the test "System properly detects devices if feature is 'off' on switch level and 'on' on flow level" with the tg88 selected:
    given: "A switch with devices feature turned off"
        // Select tg88 to reproduce the issue
        def tg = topology.activeTraffGens.shuffled().find { it.name.toString().endsWith("88")}
        def sw = tg.switchConnected
  4. Execute the test in step 3 to get the failure.

Expected result:

Both traffgens of switch 8 are working fine including sending LLDP and ARP traffic.

Actual result:

When the lab is deployed with 2 traffgens for 8th switch as shown in step 2, tg88 was not working in fact for LLDP, APR traffic. When the test selects tg88 traffgen, the traffic is sent using the API requests (i.e. HTTP PUT request to \<address>:8288/api/1/traffgen/tg88/address/fbc7ecb8-121b-11ef-86ee-0242ac110002/arp), but the flow connected devices is not detected in fact. So it seems like only the first traffgen in the list of ofsw8-related traffgens is deployed and configured correctly for APR, LLDP devices.

Workaround:

Created #5664 PR as a workaround tp skip tg88 in this test case.

izadorozhna commented 1 month ago

UPD: as you see from the issue history, the initial issue described that tg88 was not deployed. However, the other tests show that tg88 is deployed and sends the traffic. So, looks like tg88 is not configured correctly for the LLDP, ARP traffic. Thus, the issue description was changed.

izadorozhna commented 1 month ago

Btw, when the LLDP, ARP traffic from tg88 is sent to the switch, it is detected successfully. But when it is sent to the flow, it is not detected on the flow devices.

yuliiamir commented 1 month ago

This issue can be closed as the root cause of the LLDP/ARP absence has been found. Tg88 works properly, but the created flow has an incorrect port number(the port number was the port number of the first tg on the 08_ switch). Due to the logic of selection of flow_tg_port, the first tg_port is selected on the switch, that's why a flow was created with port = tg8_port => absence of LLDP/ARP.