telstra / open-kilda

OpenKilda is an open-source OpenFlow controller initially designed for use in a global network with high control-plane latency and a heavy emphasis on latency-centric data path optimisation.
Apache License 2.0
77 stars 53 forks source link

Faced an error “Couldn’t find non overlapping protected path” when main and protected paths are swapped, original main path ISLs are broken, and all not involved ISLs have not enough bandwidth #5653

Closed izadorozhna closed 1 month ago

izadorozhna commented 1 month ago

Steps to reproduce:

  1. Go to a test spec "Flow swaps to protected path when main path gets broken, becomes DEGRADED if protected path is unable to reroute(no bw)"
  2. Change the code to select the 2 and 3 switches as a pair:
         given: "Two switches with 2 diverse paths at least"
         //def switchPair = switchPairs.all().withAtLeastNNonOverlappingPaths(2).random()
         //https://github.com/telstra/open-kilda/issues/5608
    -        def switchesWhere5608IsReproducible = topology.activeSwitches.findAll {it.dpId.toString().endsWith("08")
    -        ||it.dpId.toString().endsWith("09")}
    +        def switches_2_and_3 = topology.activeSwitches.findAll {it.dpId.toString().endsWith("02")
    +                ||it.dpId.toString().endsWith("03")}
         def switchPair = switchPairs.all()
    -                .excludeSwitches(switchesWhere5608IsReproducible)
    +                .includeSwitch(switches_2_and_3[0])
    +                .includeSwitch(switches_2_and_3[1])
                 .withAtLeastNNonOverlappingPaths(2).random()
  3. Execute the test, so it will be executed with the switches 2 and 3.
  4. See the failure of the test.

Expected result Initially, the test expects that the flow becomes degraded since it is not enough BW to find a new protected path (since the original main path ISL is red and other ISLs have not enough BW). However, we are not sure that this result is expected in this corner case. Need to discuss this.

Actual result So, in case switches 2 and 3 are selected, the main path has only 1 ISL (2<-->3), and the original protected path has also only 1 ISL (2<-->3). Then the test selects the other (not involved ISLs) and decreases BW there. Then the test selects the ISL of the original main path and breaks it. Thus, as you see in the picture, the original main path is broken, and all other ISLs (not involved in main or protected path) have not enough BW.

image

The test expects the flow in degraded status since the new protected path cannot be found: not enough BW for all other ISLs and original main path ISL is red. However, the test is getting "Couldn't find non overlapping protected path" error:

{
    "flowid": "06May180347_583_flatmushrooms2003",
    "source": {
        "switch-id": "00:00:00:00:00:00:00:03",
        "port-id": 11,
        "vlan-id": 843,
        "inner-vlan-id": 0,
        "detect-connected-devices": {
            "lldp": false,
            "arp": false
        }
    },
    "destination": {
        "switch-id": "00:00:00:00:00:00:00:02",
        "port-id": 10,
        "vlan-id": 3998,
        "inner-vlan-id": 0,
        "detect-connected-devices": {
            "lldp": false,
            "arp": false
        }
    },
    "maximum-bandwidth": 500,
    "ignore_bandwidth": false,
    "periodic-pings": false,
    "allocate_protected_path": true,
    "description": "autotest flow: The world is grown so bad, that wrens make prey where eagles dare not perch.",
    "created": "2024-05-06T16:03:48.245Z",
    "last-updated": "2024-05-06T16:04:06.394Z",
    "status": "Degraded",
    "status-details": {
        "main-path": "Up",
        "protected-path": "Down"
    },
    "status_info": "Couldn't find non overlapping protected path",
    "diverse_with": [],
    "pinned": false,
    "encapsulation-type": "transit_vlan",
    "path-computation-strategy": "cost_and_available_bandwidth",
    "forward-latency": 2373,
    "reverse-latency": 1109,
    "latency-last-modified-time": "2024-05-06T16:45:01.032Z"
}

Also, l am attaching the flow history JSON file after the main ISL is broken and "Couldn't find non overlapping protected path" error is received. 06May180347_583_flatmushrooms2003.json

Questions to discuss:

izadorozhna commented 1 month ago

Quick update after a discussion with @dmitrii-beliakov :

izadorozhna commented 1 month ago

Closing this issue since in this particular described case, the Couldn’t find non overlapping protected path is expected. However, there is some other issue found during the debugging: #5655.

The given test is fixed in the PR #5656: the expected message is changed to Couldn’t find non overlapping protected path.