robotics-in-concert / rocon_multimaster

Key components for ros multimaster systems
11 stars 19 forks source link

fixing flip_ins not removed if gateway disconnects #339

Closed asmodehn closed 8 years ago

asmodehn commented 8 years ago

@stonier Please check it and merge if you are happy with it. Seems to work fine for me after a day of testing dropping network and putting back on in VM. Existing Tests are fine too.

We might want to reduce the watcher gone timeout. Not sure what is the point of keeping a flip, that we know has been disconnected already...

Changelog : [rocon_gateway] not removing flips on shutdown to have similar behavior [rocon_gateway] deleting old flips when flipping again, in case. [rocon_gateway] not unregistering gateway on shutdown to have similar behavior if error [rocon_hub] simplify gateway unavailable/dead to have only one state "gone" [rocon_hub_client] small fix to trigger update and exit quickly on shutdown

asmodehn commented 8 years ago

Only one small thing : the existing "flips" from the origin gateway are still duplicated in Redis, but it doesn't seem to have any harmful effect from what I could observe.

asmodehn commented 8 years ago

The behavior when dropping that gateway after long time absent (timeout 300) doesn't seem to be correct. This usually happens :

[WARN] [WallTime: 1463450726.786114] Hub Watcher: gateway gocart204 has been unavailable for 300.0 seconds! Removing from hub.
[INFO] [WallTime: 1463450121.782268] ConnectionCacheProxy: started inside node /concert/gateway
[INFO] [WallTime: 1463450121.782619]                     : with list topic at /concert/connection_cache/list
[INFO] [WallTime: 1463450121.783008]                     : and diff topic at /concert/connection_cache/diff
[INFO] [WallTime: 1463450122.798927] Gateway : discovered hub directly [http://localhost:6380]
[INFO] [WallTime: 1463450122.955459] Gateway : found existing mismatched public key on the hub, requesting resend for all flip-ins.
[INFO] [WallTime: 1463450122.956123] Gateway : registering on the hub [GoCart Concert]
[INFO] [WallTime: 1463450178.009834] Gateway : received a flip request [gocart204][publisher][/concert/clients/gocart204/name][/rocon/robot_identity]
[INFO] [WallTime: 1463450178.011920] Gateway : received a flip request [gocart204][publisher][/concert/clients/gocart204/battery][/rocon/battery_interface]
[INFO] [WallTime: 1463450178.013956] Gateway : received a flip request [gocart204][subscriber][/concert/clients/gocart204/battery][/celeros]
[INFO] [WallTime: 1463450178.024015] Gateway : received a flip request [gocart204][publisher][/concert/clients/gocart204/battery][/celeros]
[INFO] [WallTime: 1463450178.026059] Gateway : received a flip request [gocart204][publisher][/concert/clients/gocart204/platform_info][/rocon/app_manager]
[INFO] [WallTime: 1463450178.027692] Gateway : received a flip request [gocart204][publisher][/concert/clients/battery][/rocon/battery_interface]
[INFO] [WallTime: 1463450178.029199] Gateway : received a flip request [gocart204][subscriber][/concert/clients/heartbeat/echo][/gopher/heartbeat]
[INFO] [WallTime: 1463450178.037123] Gateway : received a flip request [gocart204][publisher][/concert/clients/heartbeat/beat][/gopher/heartbeat]
[INFO] [WallTime: 1463450178.039016] Gateway : received a flip request [gocart204][publisher][/concert/clients/gocart204/ssid][/rocon/robot_identity]
[INFO] [WallTime: 1463450178.040587] Gateway : received a flip request [gocart204][publisher][/concert/clients/battery][/celeros]
[INFO] [WallTime: 1463450178.042090] Gateway : received a flip request [gocart204][publisher][/concert/clients/gocart204/behaviour_report][/rocon/behaviours_report]
[INFO] [WallTime: 1463450178.043713] Gateway : received a flip request [gocart204][publisher][/concert/clients/gocart204/ip][/rocon/robot_identity]
[INFO] [WallTime: 1463450178.045208] Gateway : received a flip request [gocart204][subscriber][/concert/clients/battery][/celeros]
[INFO] [WallTime: 1463450727.469623] Gateway : unflipping received flip [gocart204][subscriber][/concert/clients/gocart204/battery][/celeros]
Traceback (most recent call last):
  File "/opt/groot/bootstrap_ws/src/rocon_multimaster/rocon_gateway/scripts/gateway.py", line 22, in <module>
    gateway.spin()
  File "/opt/groot/bootstrap_ws/src/rocon_multimaster/rocon_gateway/src/rocon_gateway/gateway_node.py", line 81, in spin
    self._gateway.spin()
  File "/opt/groot/bootstrap_ws/src/rocon_multimaster/rocon_gateway/src/rocon_gateway/gateway.py", line 106, in spin
    self.update_flipped_in_interface(registrations, remote_gateway_hub_index)
  File "/opt/groot/bootstrap_ws/src/rocon_multimaster/rocon_gateway/src/rocon_gateway/gateway.py", line 413, in update_flipped_in_interface
    self.master.unregister(local_registration)
  File "/opt/groot/bootstrap_ws/src/rocon_multimaster/rocon_gateway/src/rocon_gateway/master_api.py", line 227, in unregister
    node_master, registration.connection.xmlrpc_uri, registration.connection.rule.name)
  File "/opt/groot/bootstrap_ws/src/rocon_multimaster/rocon_gateway/src/rocon_gateway/master_api.py", line 302, in _unregister_subscriber
    xmlrpcapi(xmlrpc_uri).publisherUpdate('/master', name, [])
  File "/usr/lib/python2.7/xmlrpclib.py", line 1233, in __call__
    return self.__send(self.__name, args)
  File "/usr/lib/python2.7/xmlrpclib.py", line 1587, in __request
    verbose=self.__verbose
  File "/usr/lib/python2.7/xmlrpclib.py", line 1273, in request
    return self.single_request(host, handler, request_body, verbose)
  File "/usr/lib/python2.7/xmlrpclib.py", line 1301, in single_request
    self.send_content(h, request_body)
  File "/usr/lib/python2.7/xmlrpclib.py", line 1448, in send_content
    connection.endheaders(request_body)
  File "/usr/lib/python2.7/httplib.py", line 975, in endheaders
    self._send_output(message_body)
  File "/usr/lib/python2.7/httplib.py", line 835, in _send_output
    self.send(msg)
  File "/usr/lib/python2.7/httplib.py", line 797, in send
    self.connect()
  File "/usr/lib/python2.7/httplib.py", line 778, in connect
    self.timeout, self.source_address)
  File "/usr/lib/python2.7/socket.py", line 571, in create_connection
    raise err
socket.error: [Errno 113] No route to host
asmodehn commented 8 years ago

More fixes have gone in, most notably some redesign to make sure the gateway rewrite it s data into redis if the hub eventually decides to drop it.

asmodehn commented 8 years ago

goes with https://github.com/robotics-in-concert/rocon_concert/pull/318

stonier commented 8 years ago

Squashed in #340.