robotics-in-concert / rocon_multimaster

Key components for ros multimaster systems
11 stars 19 forks source link

More robust gateway flip shutdown/connection error handling #340

Closed asmodehn closed 8 years ago

asmodehn commented 8 years ago

[rocon_gateway] deleting old flips when flipping again, in case. [rocon_gateway] not unregistering gateway on shutdown to have similar behavior if error [rocon_hub] simplify gateway unavailable/dead to have only one state "gone" [rocon_hub_client] small fix to trigger update and exit quickly on shutdown

adding exception handler for registering/unregistering services/topics fixing launcher to have only one timeout for gateway "gone" cosmetics

removing external_shutdown special case. simpler shutdown behavior.

improving unavailable -> gone transition. not trying to talk over network when gateway disappear (connection might be down).

now gateway always check its hubs for valid data and can refresh it if needed.

HubConnectionFailedError exception fix.

generous timeout to find connection_cache

fixed behavior when hub disappear. cosmetics.

stonier commented 8 years ago

This is a squash of pull request #339. More history there.

stonier commented 8 years ago

Goes with https://github.com/robotics-in-concert/rocon_concert/pull/318

asmodehn commented 8 years ago

How to test:

Simulation on one host (simple - mostly valid if no special shutdown code) :

1.a. Concert stop

1.b. Robot stop

Simulation on multiple hosts (VM or Hardware - more tricky, but more valid than 1.) :

2.a Network outage

=> Notice gateway connection is reestablished via :

Multiple robots Repeat 1.b and 2.a for each robot.

More : find all kind of funny ways to break and reestablish network connection (as verified by ping). Gateway connection should always be reestablished automatically, even if it s only for a short time.

stonier commented 8 years ago

Ran simple bringup/crash robot tests, go out of range for variable length of time and then back in tests.

These all worked fine. Extreme testing will be the elevator, @asmodehn says he has done some there and these worked ok, but let's keep an eye on it there as that is where it can get inside the loop rates and possibly break things.