onosproject / onos-config

Configuration subsystem for ONOS (µONOS Architecture)
45 stars 56 forks source link

DeviceChange stuck in retry loop #900

Closed kuujo closed 2 years ago

kuujo commented 4 years ago

Describe the bug I’ve encountered a couple cases where DeviceChanges appear to be stuck in a retry loop in the device controllers. It’s unclear what the cause of this is. I suspect it may be a race that occurs when a device add event is propagated to onos-config and the device change controller cannot connect to the device because of the ordering of events.

To Reproduce To reproduce simply run a gNMI set test repeatedly until a set request hangs.

Expected behavior Regardless of the ordering of topo events and set requests, events should ultimately be reordered and the set request should eventually complete. If the device is indeed disconnected for some reason, the DeviceChange controller should preserve CPU and stop retrying updates until it reconnects.

Additional context Will update this issue once the bug is reproduced.

Andrea-Campanella commented 4 years ago

@kuujo still valid ?

adibrastegarnia commented 4 years ago

I encountered with this issue but when I added lots of log messages to the onos-config to debug and then noticed even onos-config is still connected to the device but for some reason the state of the device is updated to UNREACHABLE.

2020-10-09T03:11:53.820Z    INFO    main    onos-topo/onos-topo.go:54   Starting onos-topo
2020-10-09T03:11:53.821Z    INFO    manager manager/manager.go:28   Creating Manager
2020-10-09T03:11:53.821Z    INFO    manager manager/manager.go:39   Starting Manager
2020-10-09T03:12:08.386Z    INFO    northbound  northbound/server.go:119    Loading certs: /etc/onos/certs/tls.crt /etc/onos/certs/tls.key
2020-10-09T03:12:08.386Z    INFO    main    onos-topo/onos-topo.go:96   Started NBI on [::]:5150
2020-10-09T03:12:08.386Z    INFO    northbound  northbound/server.go:164    Starting RPC server on address: [::]:5150
2020-10-09T03:14:05.359Z    INFO    northbound/device   device/service.go:169   Updated Device id:"devicesim-1" revision:86 address:"devicesim-1-device-simulator:11161" target:"devicesim-1" version:"1.0.0" timeout:<seconds:5 > credentials:<> tls:<plain:true > type:"Devicesim" attributes:<key:"onos-config.mastership.master" value:"onos-config-5c45c87d74-fx44x" > attributes:<key:"onos-config.mastership.term" value:"1" > protocols:<protocol:GNMI connectivityState:REACHABLE channelState:CONNECTED serviceState:AVAILABLE >
2020-10-09T03:14:05.394Z    INFO    northbound/device   device/service.go:169   Updated Device id:"devicesim-1" revision:90 address:"devicesim-1-device-simulator:11161" target:"devicesim-1" version:"1.0.0" timeout:<seconds:5 > credentials:<> tls:<plain:true > type:"Devicesim" attributes:<key:"onos-config.mastership.master" value:"onos-config-5c45c87d74-fx44x" > attributes:<key:"onos-config.mastership.term" value:"1" > protocols:<protocol:GNMI connectivityState:UNREACHABLE channelState:DISCONNECTED serviceState:UNAVAILABLE >
2020-10-09T03:22:20.807Z    INFO    northbound/device   device/service.go:169   Updated Device id:"devicesim-2" revision:222 address:"devicesim-2-device-simulator:11161" target:"devicesim-2" version:"1.0.0" timeout:<seconds:5 > credentials:<> tls:<plain:true > type:"Devicesim" attributes:<key:"onos-config.mastership.master" value:"onos-config-5c45c87d74-fx44x" > attributes:<key:"onos-config.mastership.term" value:"1" > protocols:<protocol:GNMI connectivityState:REACHABLE channelState:CONNECTED serviceState:AVAILABLE >

After that if you add new devices it will be connected and the state of the second device is correct but controllers are not responding the network changes anymore and they will stuck in Pending state.