Closed kuujo closed 2 years ago
@kuujo still valid ?
I encountered with this issue but when I added lots of log messages to the onos-config to debug and then noticed even onos-config is still connected to the device but for some reason the state of the device is updated to UNREACHABLE.
2020-10-09T03:11:53.820Z INFO main onos-topo/onos-topo.go:54 Starting onos-topo
2020-10-09T03:11:53.821Z INFO manager manager/manager.go:28 Creating Manager
2020-10-09T03:11:53.821Z INFO manager manager/manager.go:39 Starting Manager
2020-10-09T03:12:08.386Z INFO northbound northbound/server.go:119 Loading certs: /etc/onos/certs/tls.crt /etc/onos/certs/tls.key
2020-10-09T03:12:08.386Z INFO main onos-topo/onos-topo.go:96 Started NBI on [::]:5150
2020-10-09T03:12:08.386Z INFO northbound northbound/server.go:164 Starting RPC server on address: [::]:5150
2020-10-09T03:14:05.359Z INFO northbound/device device/service.go:169 Updated Device id:"devicesim-1" revision:86 address:"devicesim-1-device-simulator:11161" target:"devicesim-1" version:"1.0.0" timeout:<seconds:5 > credentials:<> tls:<plain:true > type:"Devicesim" attributes:<key:"onos-config.mastership.master" value:"onos-config-5c45c87d74-fx44x" > attributes:<key:"onos-config.mastership.term" value:"1" > protocols:<protocol:GNMI connectivityState:REACHABLE channelState:CONNECTED serviceState:AVAILABLE >
2020-10-09T03:14:05.394Z INFO northbound/device device/service.go:169 Updated Device id:"devicesim-1" revision:90 address:"devicesim-1-device-simulator:11161" target:"devicesim-1" version:"1.0.0" timeout:<seconds:5 > credentials:<> tls:<plain:true > type:"Devicesim" attributes:<key:"onos-config.mastership.master" value:"onos-config-5c45c87d74-fx44x" > attributes:<key:"onos-config.mastership.term" value:"1" > protocols:<protocol:GNMI connectivityState:UNREACHABLE channelState:DISCONNECTED serviceState:UNAVAILABLE >
2020-10-09T03:22:20.807Z INFO northbound/device device/service.go:169 Updated Device id:"devicesim-2" revision:222 address:"devicesim-2-device-simulator:11161" target:"devicesim-2" version:"1.0.0" timeout:<seconds:5 > credentials:<> tls:<plain:true > type:"Devicesim" attributes:<key:"onos-config.mastership.master" value:"onos-config-5c45c87d74-fx44x" > attributes:<key:"onos-config.mastership.term" value:"1" > protocols:<protocol:GNMI connectivityState:REACHABLE channelState:CONNECTED serviceState:AVAILABLE >
After that if you add new devices it will be connected and the state of the second device is correct but controllers are not responding the network changes anymore and they will stuck in Pending state.
Describe the bug I’ve encountered a couple cases where
DeviceChange
s appear to be stuck in a retry loop in the device controllers. It’s unclear what the cause of this is. I suspect it may be a race that occurs when a device add event is propagated to onos-config and the device change controller cannot connect to the device because of the ordering of events.To Reproduce To reproduce simply run a gNMI set test repeatedly until a set request hangs.
Expected behavior Regardless of the ordering of topo events and set requests, events should ultimately be reordered and the set request should eventually complete. If the device is indeed disconnected for some reason, the
DeviceChange
controller should preserve CPU and stop retrying updates until it reconnects.Additional context Will update this issue once the bug is reproduced.