To avoid having processes ever ending up blocked by a frozen syslog sink or stats receiver, both use UDP.
The underlying libraries pass the hostnames thru to the socket.sendto method which performs a DNS lookup on EVERY CALL.
This was throttling the Queuer class to under 200 stories/second (a problem when ingesting archives).
The TEMPORARY HACK was to resolve the hostnames on startup.
This is relatively safe for stats, where the destination hostname is "tarbell", and a proxy process directs the packets to the statsd process in the grafana-graphite-statsd container there, but is unsafe for logging, where the syslog-sink host is a container in the stack, and the IP address could change WHEN the container is restarted.
The fix is to subclass logging.SyslogHandler (and StatsdClient) to keep a one line cache of the last hostname passed, the resolved IP address, and the timestamp of the resolution. If the hostname is the same, but the timestamp is over a minute old, re-resolve the IP address and update the cache, else use the cached version.
To avoid having processes ever ending up blocked by a frozen syslog sink or stats receiver, both use UDP. The underlying libraries pass the hostnames thru to the socket.sendto method which performs a DNS lookup on EVERY CALL. This was throttling the Queuer class to under 200 stories/second (a problem when ingesting archives).
The TEMPORARY HACK was to resolve the hostnames on startup.
This is relatively safe for stats, where the destination hostname is "tarbell", and a proxy process directs the packets to the statsd process in the grafana-graphite-statsd container there, but is unsafe for logging, where the syslog-sink host is a container in the stack, and the IP address could change WHEN the container is restarted.
The fix is to subclass logging.SyslogHandler (and StatsdClient) to keep a one line cache of the last hostname passed, the resolved IP address, and the timestamp of the resolution. If the hostname is the same, but the timestamp is over a minute old, re-resolve the IP address and update the cache, else use the cached version.