Closed carmine closed 6 years ago
@MVP.FUTURE
Scenario: Port Failover followed by failure followed by recovery
This scenario checks that failover and recovery happens orderly and that
failures do not break things apart. Also it checks whether alternate route
is picked after complet failure.
Given a clean flow topology
And a clean controller
And a multi-path topology
And topology contains 16 links
And a flow ffr is successfully created
And flow ffr in UP state
And flow ffr path is shortest
And traffic flows through ffr flow
When a switch 00:00:00:00:00:00:00:03 port 1 is disabled
And flow ffr in UP state
And flow ffr path is alternate
Then traffic flows through ffr flow
When a switch 00:00:00:00:00:00:00:05 port 1 is disabled
And flow ffr in DOWN state
And flow ffr path is alternate
Then traffic does not flow through ffr flow
When a switch 00:00:00:00:00:00:00:03 port 1 is enabled
>>>FAILED And flow ffr in UP state
And flow ffr path is shortest
Then traffic flows through ffr flow
I don't know what is wrong so my estimate 5d
I think this issue may takes up to 5 days because we don't know the reason why this error is occurred
I don't dig in this direction yet... so my very rude estimation is 5d.
This is a missing feature. Currently the only way to restore a flow is to bring up a port in the last path installed i.e. in p1 of s5. To make this test pass the controller needs to keep track of which port belongs to which path and on a port up event check if any path is affected. If a path is affected and if a flow with such path is down then a set of rules for the path should be created and installed or confirmed. Also this will pave way for flow path optimization feature in the future. I was thinking about this feature and I believe that it would be better to query some database for this data rather then mess with all the mappings needed. Simple queries should reduce future bugs' severity and make code cleaner and more maintainable overall. However this requires adding an extra service (a DB) and amending the rest of the code to use it. So I estimate this to be at least a 5d long exercise. If we choose some ad hoc solution then it will take about 5d as well, but in this case there is more chance that it will actually take longer to finish it and it will have greater potential to introduce new bugs.
I found a code
affectedFlows = flowCache.dumpFlows().stream()
.filter(flow -> FlowState.DOWN.equals(flow.getLeft().getState()))
.collect(Collectors.toSet());
emitRerouteCommands(affectedFlows, tuple, "ISL", FlowOperation.UPDATE);
But I don't see reroute command in logs, so will try to find what is wrong tomorrow.
affectedFlows size is 0
affectedFlows size is 0 because the flow is on UP state.
We've adjusted one of the ATs to @MVP.Future wrt path recomputation. This task includes: