If a configured device is unreachable for some reason the error handling is really bad. Basically it just throws an exception and dies.
This needs to be handled gracefully, ideally by just leaving an error message in the logs, report a status metric for all configured devices (so prometheus/alertmanager can notice the problem) and then not delivering the associated metrics, instead of "dropping dead".
If a configured device is unreachable for some reason the error handling is really bad. Basically it just throws an exception and dies.
This needs to be handled gracefully, ideally by just leaving an error message in the logs, report a status metric for all configured devices (so prometheus/alertmanager can notice the problem) and then not delivering the associated metrics, instead of "dropping dead".