Open kmoe opened 8 years ago
Do we know what happens in that case, if the heartbeat response returns an error it should be fairly easy to set a timeout to recoonect, if it just hangs (or never returns) we might need a watchdog
This will take a bit more monitoring and digging through logs so I'm going to leave this here, but making the app fail hard on error seems to have made it sufficiently stable for now.
When the servers go down (which is pretty often), the heartbeat fails and doesn't recover. We need to reconnect when this happens as otherwise I have to reboot the app.