Periodically there are errors logged describing network issues. This seems to be due to the teleport module not handling network disruption well. There seem to be various errors including connection / socket timeouts. The application seems to recover, possibly with a short (few second) delay of the next communication.
This is non-fatal and seems to have minimal impact. It has not been demonstrated that this results in message loss and the nature of the workflow supports occasional message loss, i.e. notifications and alerts are repeated periodically. It may be useful to add a watchdog to ensure the service is active, e.g. the service could communicate with an instance of a bot, like a fake session. It is not evident whether this would improve resilience.
The network errors occur in a module asynchronously which makes it challenging to catch the errors in the application error handling mechanism.
Periodically there are errors logged describing network issues. This seems to be due to the teleport module not handling network disruption well. There seem to be various errors including connection / socket timeouts. The application seems to recover, possibly with a short (few second) delay of the next communication.
This is non-fatal and seems to have minimal impact. It has not been demonstrated that this results in message loss and the nature of the workflow supports occasional message loss, i.e. notifications and alerts are repeated periodically. It may be useful to add a watchdog to ensure the service is active, e.g. the service could communicate with an instance of a bot, like a fake session. It is not evident whether this would improve resilience.
The network errors occur in a module asynchronously which makes it challenging to catch the errors in the application error handling mechanism.