telstra / open-kilda

OpenKilda is an open-source OpenFlow controller initially designed for use in a global network with high control-plane latency and a heavy emphasis on latency-centric data path optimisation.
Apache License 2.0
78 stars 53 forks source link

the reason, why a flow is not in the UP state, is missed in flowHistory #4049

Closed andriidovhan closed 1 year ago

andriidovhan commented 3 years ago
{
     "timestamp": 1612880987,
     "action": "Flow reroute completed",
     "details": "Flow reroute completed with status DEGRADED  and error null"
}
format("Flow reroute completed with status %s  and error %s", stateMachine.getNewFlowStatus(),
                            stateMachine.getErrorReason())

NOTE: There are also two spaces between flowState and and

dmitrii-beliakov commented 1 year ago

I created a flow in a test topology between switches 1 and 2 with max latency 24 and max_latency strategy. Then, when rerouting this flow, there is a record in the history:

"details": "Flow reroute completed with status DEGRADED  and error Not enough bandwidth or no path found. Can't find a path from Node(switchId=00:00:00:00:00:00:00:01, pop=null, diversityGroupUseCounter=0) to Node(switchId=00:00:00:00:00:00:00:02, pop=null, diversityGroupUseCounter=0). Reasons: Latency limit: Requested path must have latency 24ms or lower, but best path has latency 27ms, There is no non-overlapped protected path, Failed to find path with requested bandwidth=1000000",

so the error is not present only in some cases, but the history is able to save the error. (The formatting could be improved)

dmitrii-beliakov commented 1 year ago

I tried to follow the steps from MaxLatencySpec.groovy from the test case that includes this error message. But I get different result. Either I cannot get to this point because the path cannot be calculated, or the error message is present. Probably there is some other parameter in the environment that makes this issue possible. However, I was not able to find it and reproduce the issue.

dmitrii-beliakov commented 1 year ago

I found the steps to reproduce this issue. I created a flow using a standard test topology between switches 1 and 2 using MAX_LATENCY strategy and having max_latency and max_latency_tier2 such that latency between 1 and 2 directly < max_latency < sum of latencies between 1 and 2 through other switch and max_latency_tier2 is high enough for any path. After the flow is created and is in state UP, I made sure it uses a direct link between 1 and 2. Then, I took down the port 1 on switch 1, disabling the ISL between 1 and 2. Then I executed the reroute for this flow. Then I can see the following action in the flow history:

                "action": "Flow reroute completed",
                "details": "Flow reroute completed with status DEGRADED  and error null",