Closed asmodehn closed 8 years ago
Only one small thing : the existing "flips" from the origin gateway are still duplicated in Redis, but it doesn't seem to have any harmful effect from what I could observe.
The behavior when dropping that gateway after long time absent (timeout 300) doesn't seem to be correct. This usually happens :
[WARN] [WallTime: 1463450726.786114] Hub Watcher: gateway gocart204 has been unavailable for 300.0 seconds! Removing from hub.
[INFO] [WallTime: 1463450121.782268] ConnectionCacheProxy: started inside node /concert/gateway
[INFO] [WallTime: 1463450121.782619] : with list topic at /concert/connection_cache/list
[INFO] [WallTime: 1463450121.783008] : and diff topic at /concert/connection_cache/diff
[INFO] [WallTime: 1463450122.798927] Gateway : discovered hub directly [http://localhost:6380]
[INFO] [WallTime: 1463450122.955459] Gateway : found existing mismatched public key on the hub, requesting resend for all flip-ins.
[INFO] [WallTime: 1463450122.956123] Gateway : registering on the hub [GoCart Concert]
[INFO] [WallTime: 1463450178.009834] Gateway : received a flip request [gocart204][publisher][/concert/clients/gocart204/name][/rocon/robot_identity]
[INFO] [WallTime: 1463450178.011920] Gateway : received a flip request [gocart204][publisher][/concert/clients/gocart204/battery][/rocon/battery_interface]
[INFO] [WallTime: 1463450178.013956] Gateway : received a flip request [gocart204][subscriber][/concert/clients/gocart204/battery][/celeros]
[INFO] [WallTime: 1463450178.024015] Gateway : received a flip request [gocart204][publisher][/concert/clients/gocart204/battery][/celeros]
[INFO] [WallTime: 1463450178.026059] Gateway : received a flip request [gocart204][publisher][/concert/clients/gocart204/platform_info][/rocon/app_manager]
[INFO] [WallTime: 1463450178.027692] Gateway : received a flip request [gocart204][publisher][/concert/clients/battery][/rocon/battery_interface]
[INFO] [WallTime: 1463450178.029199] Gateway : received a flip request [gocart204][subscriber][/concert/clients/heartbeat/echo][/gopher/heartbeat]
[INFO] [WallTime: 1463450178.037123] Gateway : received a flip request [gocart204][publisher][/concert/clients/heartbeat/beat][/gopher/heartbeat]
[INFO] [WallTime: 1463450178.039016] Gateway : received a flip request [gocart204][publisher][/concert/clients/gocart204/ssid][/rocon/robot_identity]
[INFO] [WallTime: 1463450178.040587] Gateway : received a flip request [gocart204][publisher][/concert/clients/battery][/celeros]
[INFO] [WallTime: 1463450178.042090] Gateway : received a flip request [gocart204][publisher][/concert/clients/gocart204/behaviour_report][/rocon/behaviours_report]
[INFO] [WallTime: 1463450178.043713] Gateway : received a flip request [gocart204][publisher][/concert/clients/gocart204/ip][/rocon/robot_identity]
[INFO] [WallTime: 1463450178.045208] Gateway : received a flip request [gocart204][subscriber][/concert/clients/battery][/celeros]
[INFO] [WallTime: 1463450727.469623] Gateway : unflipping received flip [gocart204][subscriber][/concert/clients/gocart204/battery][/celeros]
Traceback (most recent call last):
File "/opt/groot/bootstrap_ws/src/rocon_multimaster/rocon_gateway/scripts/gateway.py", line 22, in <module>
gateway.spin()
File "/opt/groot/bootstrap_ws/src/rocon_multimaster/rocon_gateway/src/rocon_gateway/gateway_node.py", line 81, in spin
self._gateway.spin()
File "/opt/groot/bootstrap_ws/src/rocon_multimaster/rocon_gateway/src/rocon_gateway/gateway.py", line 106, in spin
self.update_flipped_in_interface(registrations, remote_gateway_hub_index)
File "/opt/groot/bootstrap_ws/src/rocon_multimaster/rocon_gateway/src/rocon_gateway/gateway.py", line 413, in update_flipped_in_interface
self.master.unregister(local_registration)
File "/opt/groot/bootstrap_ws/src/rocon_multimaster/rocon_gateway/src/rocon_gateway/master_api.py", line 227, in unregister
node_master, registration.connection.xmlrpc_uri, registration.connection.rule.name)
File "/opt/groot/bootstrap_ws/src/rocon_multimaster/rocon_gateway/src/rocon_gateway/master_api.py", line 302, in _unregister_subscriber
xmlrpcapi(xmlrpc_uri).publisherUpdate('/master', name, [])
File "/usr/lib/python2.7/xmlrpclib.py", line 1233, in __call__
return self.__send(self.__name, args)
File "/usr/lib/python2.7/xmlrpclib.py", line 1587, in __request
verbose=self.__verbose
File "/usr/lib/python2.7/xmlrpclib.py", line 1273, in request
return self.single_request(host, handler, request_body, verbose)
File "/usr/lib/python2.7/xmlrpclib.py", line 1301, in single_request
self.send_content(h, request_body)
File "/usr/lib/python2.7/xmlrpclib.py", line 1448, in send_content
connection.endheaders(request_body)
File "/usr/lib/python2.7/httplib.py", line 975, in endheaders
self._send_output(message_body)
File "/usr/lib/python2.7/httplib.py", line 835, in _send_output
self.send(msg)
File "/usr/lib/python2.7/httplib.py", line 797, in send
self.connect()
File "/usr/lib/python2.7/httplib.py", line 778, in connect
self.timeout, self.source_address)
File "/usr/lib/python2.7/socket.py", line 571, in create_connection
raise err
socket.error: [Errno 113] No route to host
More fixes have gone in, most notably some redesign to make sure the gateway rewrite it s data into redis if the hub eventually decides to drop it.
Squashed in #340.
@stonier Please check it and merge if you are happy with it. Seems to work fine for me after a day of testing dropping network and putting back on in VM. Existing Tests are fine too.
We might want to reduce the watcher gone timeout. Not sure what is the point of keeping a flip, that we know has been disconnected already...
Changelog : [rocon_gateway] not removing flips on shutdown to have similar behavior [rocon_gateway] deleting old flips when flipping again, in case. [rocon_gateway] not unregistering gateway on shutdown to have similar behavior if error [rocon_hub] simplify gateway unavailable/dead to have only one state "gone" [rocon_hub_client] small fix to trigger update and exit quickly on shutdown