Open robgil opened 6 years ago
Logstash actively closes idle connections. This is handled via netty (on the application layer) in logstash. The EOF in beats is not due to TCP timeout or TCP keepalive, but triggered by the Logstash itself. In order to make use of TCP keepalive, the logstash plugin must be updated to not close idle connections. The protocol itself needs some overhaul, also adding support for proper shutdown and keep-alive on the application layer. Still, on Logstash/Beats we do not want to keep on using resources for idle connections.
Ref: https://github.com/elastic/beats/issues/7590 https://github.com/logstash-plugins/logstash-input-beats/issues/74
Request to add full keepalive configurations for
logstash-input-beats
Load Balancing and Connection Pooling Considerations
client_inactivity_timeout can cause EOF errors on the beats side. This happens because logstash is closing the connection while beats thinks its still open. Having keepalive here will maintain connections between load balancers and logstash nodes. It will also help beats clients when they connect directly to logstash and not through a load balancer. Many load balancers have connection pooling functionality that benefits from having a persistent pool of connections open which can only be achieved with keepalive.
Use Case: ELB
In this use case, beats would connect to the ELB (Elastic Load Balancer) and distribute requests across logstash instances. In this design,
ELB -> Logstash
would benefit greatly from keepalive as well asbeats -> ELB
. Keepalive obviously will not traverse ELBs which act as a proxy which is why we need it on both sides (client and server).Use Case: beats direct to logstash
For intermittent logs, such as logs that generate traffic every 60s+, keepalive will maintain a connection and avoid the load balancers or proxies from closing the connections prematurely. These intermittent logs are the ones that are most impacted by the timeouts.
Use Case: slow beats traffic
There's a separate request on the beats side to add an idle timeout so instead of requiring keepalive, an idle connection would be closed until there was data to be sent. If there is data to be sent (after a long idle period) a new connection would be established.
Timeouts will need to be coordinated between the client, LBs, and servers in all cases.