Closed italovalcy closed 11 months ago
@italovalcy, thanks for reporting this and your analysis.
We have a clean up when a KytosEvent ".*.connection.lost"
is received, and we're safely keeping a reference to the switch. However, what probably happened was the task was scheduled in the event loop but hasn't been executed yet, and then the new handshake happened and ended up in this case. Yes, indeed, it'd be safer to actually do a clean up right before setting the connection as established, up to that point the switch isn't "is_connected()" yet (and handshake event hasn't been published yet), so it should be safe before any flow stats request is sent.
I'll push a fix for this soon. Thanks.
Hi,
When a switch disconnects in the middle of a FlowStats request, we will see Overalapping stats request messages being logged for a few cycles before it discards the old request and send a new one.
This can lead to a delayed Flow Consistency routine, as well as Kytos Stats delayed reports, among others. Another impact we are actually seeing is the End to End tests eventually failing in the following test case:
Looking into the Kytos logs we can see why this is happening:
In summary: after the switch disconnects (while a flow stats requests was pending from reply), the of_core keeps waiting for the reply of the old request, until a point where it eventually clear the existing request and send a new one. Switches seems to discard flow stats requests that were sent before the switch reconnects to controller, so I believe it is safe to clear flow stats request after a switch reconnects.
Ccing @viniarck to also hear his opinion on that.