robotics-in-concert / rocon_multimaster

Key components for ros multimaster systems
12 stars 19 forks source link

Redis connection error #310

Closed dwlee closed 8 years ago

dwlee commented 9 years ago

Redis connection error is happened.

Concert Client log (young; waiterbot)

[ WARN] [1435192119.657421240]: Zeroconf: timed out resolving service [xavier][_ros-multimaster-hub._tcp][local][3][ipv4][192.168.10.32:6380]
[INFO] [WallTime: 1435192119.885259] Gateway : remove lost service[192.168.10.32:6380] from discovered hub list
[ INFO] [1435192207.469856680]: Zeroconf: resolved service [xavier][_ros-multimaster-hub._tcp][local][3][ipv4][192.168.10.32:6380]
[INFO] [WallTime: 1435192207.650906] Gateway : blacklisting hub [already connected to this hub]
[ WARN] [1435192333.783076832]: Zeroconf: timed out resolving service [xavier][_ros-multimaster-hub._tcp][local][3][ipv4][192.168.10.32:6380]
[INFO] [WallTime: 1435192334.027558] Gateway : remove lost service[192.168.10.32:6380] from discovered hub list
[ INFO] [1435192419.447994075]: Zeroconf: resolved service [xavier][_ros-multimaster-hub._tcp][local][3][ipv4][192.168.10.32:6380]
[INFO] [WallTime: 1435192419.776832] Gateway : blacklisting hub [already connected to this hub]
[ WARN] [1435194146.559359327]: Zeroconf: timed out resolving service [xavier][_ros-multimaster-hub._tcp][local][3][ipv4][192.168.10.32:6380]
[INFO] [WallTime: 1435194146.776735] Gateway : remove lost service[192.168.10.32:6380] from discovered hub list
[ INFO] [1435194232.827060066]: Zeroconf: resolved service [xavier][_ros-multimaster-hub._tcp][local][3][ipv4][192.168.10.32:6380]
[INFO] [WallTime: 1435194233.427395] Gateway : blacklisting hub [already connected to this hub]
[WARN] [WallTime: 1435195565.747158] Gatew Hub pinger : Error connecting to 192.168.10.32:6380. timed out.
[INFO] [WallTime: 1435195565.748457] Gateway : lost connection with hub, attempting to disconnect...
[INFO] [WallTime: 1435195565.749432] Gateway : disengaged connection with the hub [xavier][192.168.10.32:6380]
[INFO] [WallTime: 1435195570.015383] Gateway : Found existing mismatched public key on the hub. Requesting resend for all flip-ins.
[INFO] [WallTime: 1435195571.282427] Gateway : registering on the hub [xavier]
[INFO] [WallTime: 1435195571.287700] Gateway : discovered hub via zeroconf [192.168.10.32:6380]
[INFO] [WallTime: 1435195572.462106] Gateway : blacklisting hub [already connected to this hub]
[WARN] [WallTime: 1435196050.133609] Gatew Hub pinger : Error connecting to 192.168.10.32:6380. timed out.
[INFO] [WallTime: 1435196050.134973] Gateway : lost connection with hub, attempting to disconnect...
[INFO] [WallTime: 1435196050.135876] Gateway : disengaged connection with the hub [xavier][192.168.10.32:6380]
[INFO] [WallTime: 1435196053.617961] Gateway : Found existing mismatched public key on the hub. Requesting resend for all flip-ins.
[INFO] [WallTime: 1435196053.900504] Gateway : registering on the hub [xavier]
[INFO] [WallTime: 1435196053.906138] Gateway : discovered hub via zeroconf [192.168.10.32:6380]
[INFO] [WallTime: 1435196054.060570] Gateway : blacklisting hub [already connected to this hub]
[WARN] [WallTime: 1435196389.602805] Gatew Hub pinger : Error connecting to 192.168.10.32:6380. timed out.
[INFO] [WallTime: 1435196389.603755] Gateway : lost connection with hub, attempting to disconnect...
[INFO] [WallTime: 1435196389.604464] Gateway : disengaged connection with the hub [xavier][192.168.10.32:6380]
[INFO] [WallTime: 1435196403.480824] Gateway : unresolvable hub [couldn't connect to the redis server]
[ WARN] [1435196430.391864193]: Zeroconf: timed out resolving service [xavier][_ros-multimaster-hub._tcp][local][3][ipv4][192.168.10.32:6380]
[ INFO] [1435196515.225672529]: Zeroconf: resolved service [xavier][_ros-multimaster-hub._tcp][local][3][ipv4][192.168.10.32:6380]
Exception in thread Thread-4:
Traceback (most recent call last):
  File "/usr/lib/python2.7/threading.py", line 810, in __bootstrap_inner
    self.run()
  File "/home/yujin/ros/turtlebot/src/rocon_multimaster/rocon_hub_client/src/rocon_hub_client/hub_discovery.py", line 97, in run
    self._process_new_services(new_services, unresolvable_hub)
  File "/home/yujin/ros/turtlebot/src/rocon_multimaster/rocon_hub_client/src/rocon_hub_client/hub_discovery.py", line 150, in _process_new_services
    result, reason = self.discovery_update_hook(ip, port)
  File "/home/yujin/ros/turtlebot/src/rocon_multimaster/rocon_gateway/src/rocon_gateway/gateway_node.py", line 148, in _register_gateway
    existing_advertisements
  File "/home/yujin/ros/turtlebot/src/rocon_multimaster/rocon_gateway/src/rocon_gateway/hub_manager.py", line 220, in connect_to_hub
    gateway_ip,
  File "/home/yujin/ros/turtlebot/src/rocon_multimaster/rocon_gateway/src/rocon_gateway/gateway_hub.py", line 117, in register_gateway
    if not self._redis_server.sadd(self._redis_keys['gatewaylist'], self._redis_keys['gateway']):
  File "/home/yujin/ros/turtlebot/src/rocon_tools/rocon_python_redis/src/rocon_python_redis/client.py", line 881, in sadd
    return self.execute_command('SADD', name, *values)
  File "/home/yujin/ros/turtlebot/src/rocon_tools/rocon_python_redis/src/rocon_python_redis/client.py", line 350, in execute_command
    return self.parse_response(connection, command_name, **options)
  File "/home/yujin/ros/turtlebot/src/rocon_tools/rocon_python_redis/src/rocon_python_redis/client.py", line 356, in parse_response
    response = connection.read_response()
  File "/home/yujin/ros/turtlebot/src/rocon_tools/rocon_python_redis/src/rocon_python_redis/connection.py", line 305, in read_response
    response = self._parser.read_response()
  File "/home/yujin/ros/turtlebot/src/rocon_tools/rocon_python_redis/src/rocon_python_redis/connection.py", line 102, in read_response
    response = self.read()
  File "/home/yujin/ros/turtlebot/src/rocon_tools/rocon_python_redis/src/rocon_python_redis/connection.py", line 91, in read
    (e.args,))
ConnectionError: Error while reading from socket: ('timed out',)

Concert client(bach; turtlebot)

Exception in thread Thread-5:
Traceback (most recent call last):
  File "/usr/lib/python2.7/threading.py", line 810, in __bootstrap_inner
    self.run()
  File "/home/yujin/ros/turtlebot/src/rocon_multimaster/rocon_hub_client/src/rocon_hub_client/hub_discovery.py", line 97, in run
    self._process_new_services(new_services, unresolvable_hub)
  File "/home/yujin/ros/turtlebot/src/rocon_multimaster/rocon_hub_client/src/rocon_hub_client/hub_discovery.py", line 150, in _process_new_services
    result, reason = self.discovery_update_hook(ip, port)
  File "/home/yujin/ros/turtlebot/src/rocon_multimaster/rocon_gateway/src/rocon_gateway/gateway_node.py", line 148, in _register_gateway
    existing_advertisements
  File "/home/yujin/ros/turtlebot/src/rocon_multimaster/rocon_gateway/src/rocon_gateway/hub_manager.py", line 224, in connect_to_hub
    new_hub.advertise(advertisement)
  File "/home/yujin/ros/turtlebot/src/rocon_multimaster/rocon_gateway/src/rocon_gateway/gateway_hub.py", line 510, in advertise
    self._redis_server.sadd(key, msg_str)
  File "/home/yujin/ros/turtlebot/src/rocon_tools/rocon_python_redis/src/rocon_python_redis/client.py", line 881, in sadd
    return self.execute_command('SADD', name, *values)
  File "/home/yujin/ros/turtlebot/src/rocon_tools/rocon_python_redis/src/rocon_python_redis/client.py", line 349, in execute_command
    connection.send_command(*args)
  File "/home/yujin/ros/turtlebot/src/rocon_tools/rocon_python_redis/src/rocon_python_redis/connection.py", line 300, in send_command
    self.send_packed_command(self.pack_command(*args))
  File "/home/yujin/ros/turtlebot/src/rocon_tools/rocon_python_redis/src/rocon_python_redis/connection.py", line 282, in send_packed_command
    self.connect()
  File "/home/yujin/ros/turtlebot/src/rocon_tools/rocon_python_redis/src/rocon_python_redis/connection.py", line 229, in connect
    raise ConnectionError(self._error_message(e))
ConnectionError: Error connecting to 192.168.10.32:6380. timed out.

Concert Client(chopin; pc)

[INFO] [WallTime: 1435187107.157067] Gateway : blacklisting hub [hub/ip not in non-empty whitelist [dwlee, 192.168.10.102:6380][['xavier']]]
[INFO] [WallTime: 1435187107.158156] Gateway : remove lost service[192.168.10.24:6380] from discovered hub list
[ INFO] [1435187110.587417447]: Zeroconf: resolved service [dwlee][_ros-multimaster-hub._tcp][local][3][ipv4][192.168.10.24:6380]
[WARN] [WallTime: 1435187110.683614] Gateway : failed to register gateway with the hub [192.168.10.24:6380][15][hub/ip not in non-empty whitelist [dwlee, 192.168.10.24:6380][['xavier']]]
[INFO] [WallTime: 1435187110.685358] Gateway : blacklisting hub [hub/ip not in non-empty whitelist [dwlee, 192.168.10.24:6380][['xavier']]]
[INFO] [WallTime: 1435187110.686388] Gateway : remove lost service[192.168.10.102:6380] from discovered hub list
[ INFO] [1435187137.742036318]: Zeroconf: service was removed [dwlee][_ros-multimaster-hub._tcp][local][3][ipv4]
[INFO] [WallTime: 1435187137.763311] Gateway : remove lost service[192.168.10.24:6380] from discovered hub list
[WARN] [WallTime: 1435196389.259733] Gatew Hub pinger : Error connecting to 192.168.10.32:6380. timed out.
[INFO] [WallTime: 1435196389.261071] Gateway : lost connection with hub, attempting to disconnect...
[INFO] [WallTime: 1435196389.262046] Gateway : disengaged connection with the hub [xavier][192.168.10.32:6380]
[INFO] [WallTime: 1435196402.683590] Gateway : unresolvable hub [couldn't connect to the redis server]
[INFO] [WallTime: 1435196431.718232] Gateway : Found existing mismatched public key on the hub. Requesting resend for all flip-ins.
[INFO] [WallTime: 1435196431.830791] Gateway : registering on the hub [xavier]
[INFO] [WallTime: 1435196431.834532] Gateway : discovered hub via zeroconf [192.168.10.32:6380]
[INFO] [WallTime: 1435196432.033415] Gateway : blacklisting hub [already connected to this hub]
[WARN] [WallTime: 1435196471.790789] Gatew Hub pinger : Error connecting to 192.168.10.32:6380. timed out.
[INFO] [WallTime: 1435196471.792317] Gateway : lost connection with hub, attempting to disconnect...
[INFO] [WallTime: 1435196471.794638] Gateway : disengaged connection with the hub [xavier][192.168.10.32:6380]
[INFO] [WallTime: 1435196482.533163] Gateway : Found existing mismatched public key on the hub. Requesting resend for all flip-ins.
[INFO] [WallTime: 1435196495.448320] Gateway : registering on the hub [xavier]
[INFO] [WallTime: 1435196495.452265] Gateway : discovered hub via zeroconf [192.168.10.32:6380]
[INFO] [WallTime: 1435196505.293857] Gateway : blacklisting hub [already connected to this hub]
[WARN] [WallTime: 1435196545.818922] Gatew Hub pinger : Error while reading from socket: ('timed out',)
[ERROR] [WallTime: 1435196545.819957] Gateway : unable to publish network statistics [no connection to the hub]
[INFO] [WallTime: 1435196545.821764] Gateway : lost connection with hub, attempting to disconnect...
[INFO] [WallTime: 1435196545.823733] Gateway : disengaged connection with the hub [xavier][192.168.10.32:6380]
[INFO] [WallTime: 1435196564.642464] Gateway : Found existing mismatched public key on the hub. Requesting resend for all flip-ins.
[INFO] [WallTime: 1435196566.167872] Gateway : registering on the hub [xavier]
[INFO] [WallTime: 1435196566.171217] Gateway : discovered hub via zeroconf [192.168.10.32:6380]
[INFO] [WallTime: 1435196567.504839] Gateway : blacklisting hub [already connected to this hub]
[WARN] [WallTime: 1435196589.399249] Gatew Hub pinger : Error while reading from socket: ('timed out',)
[INFO] [WallTime: 1435196589.400527] Gateway : lost connection with hub, attempting to disconnect...
[ERROR] [WallTime: 1435196596.248499] Gateway : unable to publish network statistics [no connection to the hub]
[INFO] [WallTime: 1435196596.252145] Gateway : disengaged connection with the hub [xavier][192.168.10.32:6380]
[INFO] [WallTime: 1435196599.469401] Gateway : Found existing mismatched public key on the hub. Requesting resend for all flip-ins.
[INFO] [WallTime: 1435196600.169634] Gateway : registering on the hub [xavier]
[INFO] [WallTime: 1435196600.172778] Gateway : discovered hub via zeroconf [192.168.10.32:6380]
[INFO] [WallTime: 1435196600.407624] Gateway : blacklisting hub [already connected to this hub]
[INFO] [WallTime: 1435196606.489090] Gateway : sending flip request [xavier99510489b69a4e9eaa10f10de1dbefa3][service][/chopine6f23e6adf68488a8e0721e0ef587bc9/stop_rapp][/app_manager,http://chopin:51101/]
[INFO] [WallTime: 1435196606.834883] Gateway : sending flip request [xavier99510489b69a4e9eaa10f10de1dbefa3][service][/chopine6f23e6adf68488a8e0721e0ef587bc9/start_rapp][/app_manager,http://chopin:51101/]
[WARN] [WallTime: 1435196639.581754] Gatew Hub pinger : Error while reading from socket: ('timed out',)
[INFO] [WallTime: 1435196639.583212] Gateway : lost connection with hub, attempting to disconnect...
[INFO] [WallTime: 1435196639.584411] Gateway : disengaged connection with the hub [xavier][192.168.10.32:6380]
[INFO] [WallTime: 1435196655.299333] Gateway : unresolvable hub [couldn't connect to the redis server]
[INFO] [WallTime: 1435196697.107773] Gateway : Found existing mismatched public key on the hub. Requesting resend for all flip-ins.
[INFO] [WallTime: 1435196697.820603] Gateway : registering on the hub [xavier]
[INFO] [WallTime: 1435196697.823536] Gateway : discovered hub via zeroconf [192.168.10.32:6380]
[INFO] [WallTime: 1435196698.081516] Gateway : blacklisting hub [already connected to this hub]

Master log

[WARN] [WallTime: 1435196450.049409] Hub Watcher: gateway young9bf3cced758e4fda846e5165e2eb23b3 has been unavailable for 60 seconds! Marking as unavailable.
[WARN] [WallTime: 1435196450.051876] Hub Watcher: gateway bach5a4bcc63437246e29a9689428b6de019 has been unavailable for 60 seconds! Marking as unavailable.
[INFO] [WallTime: 1435196450.397543] Conductor : concert client transition [available->missing][young]
[INFO] [WallTime: 1435196450.397890] Conductor : concert client transition [available->missing][bach]
[ INFO] [1435196450.415063628]: Remove subscriber of : motion_checker
[ INFO] [1435196450.415117729]: Remove subscriber of : motion_checker
[WARN] [WallTime: 1435196465.057531] Hub Watcher: gateway chopine6f23e6adf68488a8e0721e0ef587bc9 has been unavailable for 60 seconds! Marking as unavailable.
[INFO] [WallTime: 1435196465.710704] Conductor : concert client transition [available->missing][chopin]
[ INFO] [1435196465.815097308]: Remove subscriber of : motion_checker
[INFO] [WallTime: 1435196469.792318] Conductor : concert client transition [missing->available][bach]
[ INFO] [1435196469.815034297]: Create Subscriber for : bach    Topic : /bach/robot_pose
[INFO] [WallTime: 1435196470.819498] Conductor : concert client transition [missing->available][chopin]
[INFO] [WallTime: 1435196470.819821] Conductor : concert client transition [available->missing][bach]
[ INFO] [1435196470.915000982]: Create Subscriber for : chopin  Topic : /chopin/robot_pose
[ INFO] [1435196470.917068982]: Remove subscriber of : motion_checker
[INFO] [WallTime: 1435196475.923957] Conductor : concert client transition [missing->available][bach]
[ INFO] [1435196476.015034578]: Create Subscriber for : bach    Topic : /bach/robot_pose
[WARN] [WallTime: 1435196605.157661] Hub Watcher: gateway bach5a4bcc63437246e29a9689428b6de019 has been unavailable for 60 seconds! Marking as unavailable.
[INFO] [WallTime: 1435196605.473594] Conductor : concert client transition [available->missing][bach]
[ INFO] [1435196605.515029687]: Remove subscriber of : motion_checker
[INFO] [WallTime: 1435196610.541054] Conductor : concert client transition [missing->available][bach]
[ INFO] [1435196610.615069806]: Create Subscriber for : bach    Topic : /bach/robot_pose
[WARN] [WallTime: 1435196670.206136] Hub Watcher: gateway bach5a4bcc63437246e29a9689428b6de019 has been unavailable for 60 seconds! Marking as unavailable.
[INFO] [WallTime: 1435196671.184860] Conductor : concert client transition [available->missing][bach]
[ INFO] [1435196671.215203477]: Remove subscriber of : motion_checker
[WARN] [WallTime: 1435196715.236841] Hub Watcher: gateway chopine6f23e6adf68488a8e0721e0ef587bc9 has been unavailable for 60 seconds! Marking as unavailable.
[INFO] [WallTime: 1435196716.127172] Conductor : concert client transition [available->missing][chopin]
[ INFO] [1435196716.215070066]: Remove subscriber of : motion_checker
[INFO] [WallTime: 1435196731.534619] Conductor : concert client transition [missing->available][chopin]
stonier commented 8 years ago

Should be fixed by Jihoon's improved gateway registration code (same line and he has the connection error exception handler).