TCP connections building up in CLOSE_WAIT state

Logstash information:

docker container docker.elastic.co/logstash/logstash:8.11.4

Description of the problem including expected versus actual behavior: I have messages getting sent to the logstash tcp input plugin from python code running on 5,000+ workers. The connection to the logstash input is through an azure load balancer. It seems like logstash isn't recognizing when these connections get disconnected, because I can easily build up 100,000 connections in the CLOSE_WAIT state on the logstash server in ~30 min. Eventually this seems to essentially stall logstash (it stops making output connections to elasticsearch). Restarting the docker container instantly cleans up all the connections, and after restart sometimes it takes a few hours to get into this bad state where most connections seem to get left in the CLOST_WAIT state.

Steps to reproduce:

start logstash docker container with the configuration below
run the below code (every time you run it, another connection is left in the CLOSE_WAIT state on the logstash side)

This may only happen when many clients are sending messages to the same logstash TCP input at the same time.

input {
    tcp {
        type => "pipelinelog"
        port => 5040
        codec => json
        tcp_keep_alive => true
    }
    udp {
        type => "pipelinelog"
        port => 5040
        codec => json
    }
...
output {
  # stdout { codec => rubydebug }

  elasticsearch {
    hosts => "https://<hostname>:9243"
    user => "<user>"
    password => "<password>"
    index => "pipelinelog-%{name}-%{+yyyy.MM.dd}"
    action => "create"
    compression_level => 0
    resurrect_delay => 1
    retry_initial_interval => 1
    retry_max_interval => 1
    retry_on_conflict => 0
    timeout => 10
    silence_errors_in_log => ["version_conflict_engine_exception"]
    manage_template => false
    ilm_enabled => false
  }
}

import json
import logging
from logging.handlers import SocketHandler

class LogstashFormatter(logging.Formatter):
  def __init__(self):
    logging.Formatter.__init__(self, None, '%a %b %d %H:%M:%S %Z %Y')

  def format(self, record):
    return json.dumps({"msg": record.getMessage()})

class TcpLogstashHandler(SocketHandler):
  def __init__(self, host, port, formatter):
    super().__init__(host, port)
    self.formatter = formatter

class LogstashHandler(TcpLogstashHandler):
  def __init__(self, host, port, name, formatter):
    super().__init__(host, port, formatter)
    self.name = name
    self.setFormatter(formatter)

if __name__ == "__main__":
  log = logging.Logger("test")
  ls_formatter = LogstashFormatter()
  ls_handler = LogstashHandler("<hostname>", 5040, "pipeline", ls_formatter)
  log.addHandler(ls_handler)
  s_handler = logging.StreamHandler()
  log.addHandler(s_handler)

  log.info("testing 1, 2, 3...")

Looking at a tcpdump of the traffic I see the client sends FIN then logstash sends back an ACK messages but no FIN (connection leak?)

$ sudo tcpdump -nnvvSX -i enp1s0 port 5040
tcpdump: listening on enp1s0, link-type EN10MB (Ethernet), capture size 262144 bytes
20:15:17.580753 IP (tos 0x0, ttl 64, id 16538, offset 0, flags [DF], proto TCP (6), length 60)
    192.168.44.1.51966 > 10.2.2.254.5040: Flags [S], cksum 0xf9d7 (incorrect -> 0xfb7b), seq 4241861493, win 64240, options [mss 1460,sackOK,TS val 1451391597 ecr 0,nop,wscale 7], length 0
    0x0000:  4500 003c 409a 4000 4006 0079 c0a8 2c01  E..<@.@.@..y..,.
    0x0010:  0a02 02fe cafe 13b0 fcd5 ab75 0000 0000  ...........u....
    0x0020:  a002 faf0 f9d7 0000 0204 05b4 0402 080a  ................
    0x0030:  5682 7a6d 0000 0000 0103 0307            V.zm........
20:15:17.608895 IP (tos 0x0, ttl 63, id 0, offset 0, flags [DF], proto TCP (6), length 60)
    10.2.2.254.5040 > 192.168.44.1.51966: Flags [S.], cksum 0xd3c7 (correct), seq 3935284167, ack 4241861494, win 65160, options [mss 1360,sackOK,TS val 3100169550 ecr 1451391597,nop,wscale 7], length 0
    0x0000:  4500 003c 0000 4000 3f06 4213 0a02 02fe  E..<..@.?.B.....
    0x0010:  c0a8 2c01 13b0 cafe ea8f abc7 fcd5 ab76  ..,............v
    0x0020:  a012 fe88 d3c7 0000 0204 0550 0402 080a  ...........P....
    0x0030:  b8c8 d54e 5682 7a6d 0103 0307            ...NV.zm....
20:15:17.608924 IP (tos 0x0, ttl 64, id 16539, offset 0, flags [DF], proto TCP (6), length 52)
    192.168.44.1.51966 > 10.2.2.254.5040: Flags [.], cksum 0xf9cf (incorrect -> 0xfea6), seq 4241861494, ack 3935284168, win 502, options [nop,nop,TS val 1451391625 ecr 3100169550], length 0
    0x0000:  4500 0034 409b 4000 4006 0080 c0a8 2c01  E..4@.@.@.....,.
    0x0010:  0a02 02fe cafe 13b0 fcd5 ab76 ea8f abc8  ...........v....
    0x0020:  8010 01f6 f9cf 0000 0101 080a 5682 7a89  ............V.z.
    0x0030:  b8c8 d54e                                ...N
20:15:17.608961 IP (tos 0x0, ttl 64, id 16540, offset 0, flags [DF], proto TCP (6), length 599)
    192.168.44.1.51966 > 10.2.2.254.5040: Flags [P.], cksum 0xfbf2 (incorrect -> 0x6807), seq 4241861494:4241862041, ack 3935284168, win 502, options [nop,nop,TS val 1451391626 ecr 3100169550], length 547
    0x0000:  4500 0257 409c 4000 4006 fe5b c0a8 2c01  E..W@.@.@..[..,.
    0x0010:  0a02 02fe cafe 13b0 fcd5 ab76 ea8f abc8  ...........v....
    0x0020:  8018 01f6 fbf2 0000 0101 080a 5682 7a8a  ............V.z.
    0x0030:  b8c8 d54e 0000 021f 7d71 0028 5804 0000  ...N....}q.(X...
    0x0040:  006e 616d 6571 0158 0400 0000 7465 7374  .nameq.X....test
    0x0050:  7102 5803 0000 006d 7367 7103 5812 0000  q.X....msgq.X...
    0x0060:  0074 6573 7469 6e67 2031 2c20 322c 2033  .testing.1,.2,.3
    0x0070:  2e2e 2e71 0458 0400 0000 6172 6773 7105  ...q.X....argsq.
    0x0080:  4e58 0900 0000 6c65 7665 6c6e 616d 6571  NX....levelnameq
    0x0090:  0658 0400 0000 494e 464f 7107 5807 0000  .X....INFOq.X...
    0x00a0:  006c 6576 656c 6e6f 7108 4b14 5808 0000  .levelnoq.K.X...
    0x00b0:  0070 6174 686e 616d 6571 0958 2a00 0000  .pathnameq.X*...
    0x00c0:  2f68 6f6d 652f 6261 746d 616e 2f78 782f  /home/batman/xx/
    0x00d0:  6368 6666 722f 6c69 622f 6c6f 6773 7461  chffr/lib/logsta
    0x00e0:  7368 5f74 6573 742e 7079 710a 5808 0000  sh_test.pyq.X...
    0x00f0:  0066 696c 656e 616d 6571 0b58 1000 0000  .filenameq.X....
    0x0100:  6c6f 6773 7461 7368 5f74 6573 742e 7079  logstash_test.py
    0x0110:  710c 5806 0000 006d 6f64 756c 6571 0d58  q.X....moduleq.X
    0x0120:  0d00 0000 6c6f 6773 7461 7368 5f74 6573  ....logstash_tes
    0x0130:  7471 0e58 0800 0000 6578 635f 696e 666f  tq.X....exc_info
    0x0140:  710f 4e58 0800 0000 6578 635f 7465 7874  q.NX....exc_text
    0x0150:  7110 4e58 0a00 0000 7374 6163 6b5f 696e  q.NX....stack_in
    0x0160:  666f 7111 4e58 0600 0000 6c69 6e65 6e6f  foq.NX....lineno
    0x0170:  7112 4b1f 5808 0000 0066 756e 634e 616d  q.K.X....funcNam
    0x0180:  6571 1358 0800 0000 3c6d 6f64 756c 653e  eq.X....<module>
    0x0190:  7114 5807 0000 0063 7265 6174 6564 7115  q.X....createdq.
    0x01a0:  4741 d971 69b5 650f d158 0500 0000 6d73  GA.qi.e..X....ms
    0x01b0:  6563 7371 1647 4082 1800 0000 0000 580f  ecsq.G@.......X.
    0x01c0:  0000 0072 656c 6174 6976 6543 7265 6174  ...relativeCreat
    0x01d0:  6564 7117 4740 1a96 2700 0000 0058 0600  edq.G@..'....X..
    0x01e0:  0000 7468 7265 6164 7118 4c31 3430 3239  ..threadq.L14029
    0x01f0:  3232 3938 3031 3234 3830 4c0a 580a 0000  2298012480L.X...
    0x0200:  0074 6872 6561 644e 616d 6571 1958 0a00  .threadNameq.X..
    0x0210:  0000 4d61 696e 5468 7265 6164 711a 580b  ..MainThreadq.X.
    0x0220:  0000 0070 726f 6365 7373 4e61 6d65 711b  ...processNameq.
    0x0230:  580b 0000 004d 6169 6e50 726f 6365 7373  X....MainProcess
    0x0240:  711c 5807 0000 0070 726f 6365 7373 711d  q.X....processq.
    0x0250:  4a47 022a 0075 2e                        JG.*.u.
20:15:17.609069 IP (tos 0x0, ttl 64, id 16541, offset 0, flags [DF], proto TCP (6), length 52)
    192.168.44.1.51966 > 10.2.2.254.5040: Flags [F.], cksum 0xf9cf (incorrect -> 0xfc81), seq 4241862041, ack 3935284168, win 502, options [nop,nop,TS val 1451391626 ecr 3100169550], length 0
    0x0000:  4500 0034 409d 4000 4006 007e c0a8 2c01  E..4@.@.@..~..,.
    0x0010:  0a02 02fe cafe 13b0 fcd5 ad99 ea8f abc8  ................
    0x0020:  8011 01f6 f9cf 0000 0101 080a 5682 7a8a  ............V.z.
    0x0030:  b8c8 d54e                                ...N
20:15:17.637028 IP (tos 0x0, ttl 63, id 41548, offset 0, flags [DF], proto TCP (6), length 52)
    10.2.2.254.5040 > 192.168.44.1.51966: Flags [.], cksum 0xfc62 (correct), seq 3935284168, ack 4241862041, win 505, options [nop,nop,TS val 3100169579 ecr 1451391626], length 0
    0x0000:  4500 0034 a24c 4000 3f06 9fce 0a02 02fe  E..4.L@.?.......
    0x0010:  c0a8 2c01 13b0 cafe ea8f abc8 fcd5 ad99  ..,.............
    0x0020:  8010 01f9 fc62 0000 0101 080a b8c8 d56b  .....b.........k
    0x0030:  5682 7a8a                                V.z.
20:15:17.678673 IP (tos 0x0, ttl 63, id 41549, offset 0, flags [DF], proto TCP (6), length 52)
    10.2.2.254.5040 > 192.168.44.1.51966: Flags [.], cksum 0xfc38 (correct), seq 3935284168, ack 4241862042, win 505, options [nop,nop,TS val 3100169620 ecr 1451391626], length 0
    0x0000:  4500 0034 a24d 4000 3f06 9fcd 0a02 02fe  E..4.M@.?.......
    0x0010:  c0a8 2c01 13b0 cafe ea8f abc8 fcd5 ad9a  ..,.............
    0x0020:  8010 01f9 fc38 0000 0101 080a b8c8 d594  .....8..........
    0x0030:  5682 7a8a                                V.z.

I ran into the same issue here. I see there are about 278 close_wait connection there for ever. --logstash-input-syslog-3.6.0

I had over 100,000 connections in the CLOSE_WAIT state when I had issues, probably due to clients connecting for very short periods of time. I believe in the end this improved the situation to the point that I now rarely have this issue:

sudo sysctl net.ipv4.tcp_keepalive_intvl=60
sudo sysctl net.ipv4.tcp_keepalive_probes=9
sudo sysctl net.ipv4.tcp_keepalive_time=60

These are the default on my docker container: net.ipv4.tcp_keepalive_intvl = 75 net.ipv4.tcp_keepalive_probes = 9 net.ipv4.tcp_keepalive_time = 7200

The only option that makes considerable would be "tcp_keepalive_time", however, I keep watching the connection and found it keeps close_wait for ever intead of 7200 seconds.

I am still in tring to produce the same issue via script and will change net.ipv4.tcp_keepalive_time to shorter to see if it helps.

Thank you.

I don't think you understand how those parameters work, 7200 seconds is how long it waits to send the first keepalive after the connection becomes idle. My settings lower the total time after idle to 10 min. https://tldp.org/HOWTO/TCP-Keepalive-HOWTO/usingkeepalive.html

For the problem I had, my best guess as to what was happening is the following:

client connects to logstash (server), server side has keep alive enabled
client disconnects abnormally, server side keep alive figures this out and puts connection in CLOSE_WAIT state
at this point it is the job of the TCP server inside logstash (netty) to cleanup the connection (connection can stay in this state forever if it never cleans up) and lowering the keepalive parameters above made the cleanup happen faster

I also found that if I turn off tcp_keep_alive for each logstash input configured, the connections would stay ESTABLISHED forever, which is what caused me to start playing with the keepalive parameters. It kind of seems to me like there is a problem with netty, or how logstash uses netty.

@gregjhogan Thank you for your patience.

I read the link again and I think we are talking kinda same. I was wondering why the "CLOSE_WAIT" line is kept "for ever" even there is "net.ipv4.tcp_keepalive_time" by default.

Since you statement that logstash will do clean. So I can setup a clear step to show that makes no sense. First, I put parameters below to sysctl and make sure it takes effect and restart logstash (it's in one virutal machine as well, shorter value to see result more efftive):

net.ipv4.tcp_keepalive_time=10
net.ipv4.tcp_keepalive_intvl=4
net.ipv4.tcp_keepalive_probes=3

Now I boot another virutal machine, from there I can send request to logstash, code is below(python3):

import asyncio

async def create_close_wait(host, port, connection_id):
    future = asyncio.open_connection(host, port)
    print(f"Connection {connection_id} is opening")
    reader, writer = None, None
    while True:
        try:
            reader, writer = await asyncio.wait_for(future, 5)
            break
        except asyncio.TimeoutError:
            continue

    print(f"Connection {connection_id} opened")
    writer.write(b'-' * 1500)
    await writer.drain()
    print('let wait 4ever')
    while True:
        await asyncio.sleep(1)

async def main():
    host = "192.168.13.150"  # the remote logstash ip
    port = 514  # Logstash syslog port
    count = 100  # how many conn to create

    tasks = []
    for i in range(count):
        tasks.append(create_close_wait(host, port, i + 1))
    await asyncio.gather(*tasks)

if __name__ == "__main__":
    asyncio.run(main())

After script launch, just disconnect the virutal nic, so that you can check in logstash it keeps connection ESTABLISHED for ever. To make more connection you can kill script then connect nic back then launch script (do exactly same step to make sure there is no rst/fin to logstash) to make connection increase huge:

cut more ....
tcp6       0      0 192.168.13.150:514      192.168.13.134:41494    ESTABLISHED 21950/java      
tcp6       0      0 192.168.13.150:514      192.168.13.134:41312    ESTABLISHED 21950/java      
tcp6       0      0 192.168.13.150:514      192.168.13.134:41506    ESTABLISHED 21950/java      
tcp6       0      0 192.168.13.150:514      192.168.13.134:39718    ESTABLISHED 21950/java      
tcp6       0      0 192.168.13.150:514      192.168.13.134:41294    ESTABLISHED 21950/java      
tcp6       0      0 192.168.13.150:514      192.168.13.134:40580    ESTABLISHED 21950/java      
tcp6       0      0 192.168.13.150:514      192.168.13.134:39626    ESTABLISHED 21950/java      
tcp6       0      0 192.168.13.150:514      192.168.13.134:39730    ESTABLISHED 21950/java      
tcp6       0      0 192.168.13.150:514      192.168.13.134:39688    ESTABLISHED 21950/java      
tcp6       0      0 192.168.13.150:514      192.168.13.134:39682    ESTABLISHED 21950/java      
tcp6       0      0 192.168.13.150:514      192.168.13.134:41300    ESTABLISHED 21950/java      
tcp6       0      0 192.168.13.150:514      192.168.13.134:41220    ESTABLISHED 21950/java      
tcp6       0      0 192.168.13.150:514      192.168.13.134:39638    ESTABLISHED 21950/java      
tcp6       0      0 192.168.13.150:514      192.168.13.134:39622    ESTABLISHED 21950/java      
tcp6       0      0 192.168.13.150:514      192.168.13.134:41186    ESTABLISHED 21950/java      
tcp6       0      0 192.168.13.150:514      192.168.13.134:41298    ESTABLISHED 21950/java      
tcp6       0      0 192.168.13.150:514      192.168.13.134:41460    ESTABLISHED 21950/java      
tcp6       0      0 192.168.13.150:514      192.168.13.134:41464    ESTABLISHED 21950/java      
cut more ....

and I wait for more than 10 minutes and make sure I am waiting longer then what net.ipv4.tcp_keepalive_xx settled.

Are we on same page?

Thanks again.

I mentioned above that I saw similar behavior (ESTABLISHED forever) if I turn off tcp_keep_alive for each logstash input TCP port configured to listen on, so I am guessing you didn't turn that on (it is off by default).

https://www.elastic.co/guide/en/logstash/current/plugins-inputs-tcp.html#plugins-inputs-tcp-tcp_keep_alive

Yes, you are right.

Let me add that option to see the result.

Thank you, @gregjhogan

with short keepalive values, the close_wait seems go aways. But see more than 100+ threads on syslog like below:

30814 root 20 0 7299604 1.265g 21216 S 0.0 8.2 0:00.00 input|syslog|tc
30954 root 20 0 7299604 1.265g 21216 S 0.0 8.2 0:00.00 input|syslog|tc
31062 root 20 0 7299604 1.265g 21216 S 0.0 8.2 0:00.00 input|syslog|tc
31484 root 20 0 7299604 1.265g 21216 S 0.0 8.2 0:00.00 input|syslog|tc
32761 root 20 0 7299604 1.265g 21216 S 0.0 8.2 0:00.00 input|syslog|tc
 1639 root 20 0 7299604 1.265g 21216 S 0.0 8.2 0:00.00 input|syslog|tc

Via wireshark the number of syslog(udp packet) is less than 20 pps.

It produces 429 for http plugin(I don't understand as well :-( ) and for there more than hours (No time to wait how long it will keeps -- restarted and see it recovered)

logstash-plugins / logstash-input-tcp

TCP connections building up in CLOSE_WAIT state #218