Investigate large volume of connections in FIN_WAIT2 state

jakubgs commented 3 years ago

We're seeing a lot of connections stuck in FIN_WAIT2 state on Pyrmont fleet hosts.

jakubgs commented 3 years ago

I checked this after the new fleet layout was deployed and it appears like this issues is mostly affecting the libp2p host:

 > ansible nimbus.pyrmont -o -a 'sudo netstat -pnt | grep FIN_WAIT2 | wc -l'
unstable-large-01.aws-eu-central-1a.nimbus.pyrmont | CHANGED | rc=0 | (stdout) 0
unstable-large-02.aws-eu-central-1a.nimbus.pyrmont | CHANGED | rc=0 | (stdout) 72
stable-large-01.aws-eu-central-1a.nimbus.pyrmont | CHANGED | rc=0 | (stdout) 0
testing-large-01.aws-eu-central-1a.nimbus.pyrmont | CHANGED | rc=0 | (stdout) 0
testing-small-02.aws-eu-central-1a.nimbus.pyrmont | CHANGED | rc=0 | (stdout) 0
testing-small-04.aws-eu-central-1a.nimbus.pyrmont | CHANGED | rc=0 | (stdout) 0
unstable-small-02.aws-eu-central-1a.nimbus.pyrmont | CHANGED | rc=0 | (stdout) 4
testing-small-03.aws-eu-central-1a.nimbus.pyrmont | CHANGED | rc=0 | (stdout) 0
stable-small-01.aws-eu-central-1a.nimbus.pyrmont | CHANGED | rc=0 | (stdout) 5
unstable-small-01.aws-eu-central-1a.nimbus.pyrmont | CHANGED | rc=0 | (stdout) 29
unstable-small-04.aws-eu-central-1a.nimbus.pyrmont | CHANGED | rc=0 | (stdout) 44
unstable-small-03.aws-eu-central-1a.nimbus.pyrmont | CHANGED | rc=0 | (stdout) 49
testing-small-01.aws-eu-central-1a.nimbus.pyrmont | CHANGED | rc=0 | (stdout) 0
unstable-libp2p-small-01.aws-eu-central-1a.nimbus.pyrmont | CHANGED | rc=0 | (stdout) 4089

jakubgs commented 3 years ago

The number doesn't change much in a short span like 10 minutes. Goes up by one, goes down by one. Not much movement.

jakubgs commented 3 years ago

According to RFC-793:

FIN-WAIT-1 - represents waiting for a connection termination request from the remote TCP, or an acknowledgment of the connection termination request previously sent. FIN-WAIT-2 - represents waiting for a connection termination request from the remote TCP.

https://tools.ietf.org/html/rfc793

And according to the diagram on page 22 the FIN_WAIT_2 is reached after FIN_WAIT_1 when the first FIN packet is sent one way, but the second and final one is never received successfully:

jakubgs commented 3 years ago

This issue is also described by IBM PI-73495:

When a secure connection is closed, there is a possibility that a TCB (connection structure) with FIN-WAIT-2 status will not be released and will stay around forever until TCPIP is restarted. The problem occurs when TCPIP has already sent a FIN packet to close out the connection, but it never receives the FIN packet back from the peer, which results in the connection never being fully closed.

https://www.ibm.com/support/pages/apar/PI73495

jakubgs commented 3 years ago

If many sockets which were connected to a specific remote application end up stuck in this state, it usually indicates that the remote application either always dies unexpectedly when in the CLOSE_WAIT state or just fails to perform an active close after the passive close.

The timeout for sockets in the FIN-WAIT-2 state is defined with the parameter tcp_fin_timeout. You should set it to value high enough so that if the remote end-point is going to perform an active close, it will have time to do it. On the other hand sockets in this state do use some memory (even though not much) and this could lead to a memory overflow if too many sockets are stuck in this state for too long.

https://benohead.com/blog/2013/07/21/tcp-about-fin_wait_2-time_wait-and-close_wait/

jakubgs commented 3 years ago

tcp_fin_timeout (integer; default: 60; since Linux 2.2) - This specifies how many seconds to wait for a final FIN packet before the socket is forcibly closed. This is strictly a violation of the TCP specification, but required to prevent denial-of-service attacks. In Linux 2.2, the default value was 180.

https://man7.org/linux/man-pages/man7/tcp.7.html

jakubgs commented 3 years ago

The kernel timeout only applies if the connection is orphaned. If the connection is still attached to a socket, the program that owns that socket is responsible for timing out the shutdown of the connection. Likely it has called shutdown and is waiting for the connection to shut down cleanly.

https://serverfault.com/questions/738300/why-are-connections-in-fin-wait2-state-not-closed-by-the-linux-kernel

So it's likely that the code on libp2p branch calls shutdown on the socket but not close.

jakubgs commented 3 years ago

The issue seems to affect only the 9100 libp2p port:

admin@unstable-libp2p-small-01.aws-eu-central-1a.nimbus.pyrmont:~ % sudo ss -H -4 state FIN-WAIT-2 | awk '{a[$5]++}END{for(x in a){printf "%5d - %s\n", a[x], x}}'
 4109 - 172.17.2.2:9100
    2 - 127.0.0.1:9100

jakubgs commented 3 years ago

Interesting article about a similar issue by CloudFlare: https://blog.cloudflare.com/this-is-strictly-a-violation-of-the-tcp-specification/

jakubgs commented 3 years ago

It appears that the issue affects mostly libp2p but also unstable but stable and testing are not affected currently.

jakubgs commented 3 years ago

Although, to be fair, if we compare the times since last restart they do kinda match up with number of FIN-WAIT-2 connections:

 > ansible nimbus.pyrmont -o -a 'ss -H -4 state FIN-WAIT-2 | wc -l; echo -n " - "; docker ps --format="{%raw%}{{.Status}}{%endraw%}"' | sort
stable-large-01.aws-eu-central-1a.nimbus.pyrmont | CHANGED | rc=0 | (stdout) 0\n - Up 55 minutes
stable-small-01.aws-eu-central-1a.nimbus.pyrmont | CHANGED | rc=0 | (stdout) 0\n - Up 3 minutes
testing-large-01.aws-eu-central-1a.nimbus.pyrmont | CHANGED | rc=0 | (stdout) 0\n - Up 26 hours
testing-small-01.aws-eu-central-1a.nimbus.pyrmont | CHANGED | rc=0 | (stdout) 0\n - Up 3 hours
testing-small-02.aws-eu-central-1a.nimbus.pyrmont | CHANGED | rc=0 | (stdout) 0\n - Up 3 days
testing-small-03.aws-eu-central-1a.nimbus.pyrmont | CHANGED | rc=0 | (stdout) 0\n - Up 3 days
testing-small-04.aws-eu-central-1a.nimbus.pyrmont | CHANGED | rc=0 | (stdout) 0\n - Up 3 days
unstable-large-01.aws-eu-central-1a.nimbus.pyrmont | CHANGED | rc=0 | (stdout) 0\n - Up 4 minutes
unstable-large-02.aws-eu-central-1a.nimbus.pyrmont | CHANGED | rc=0 | (stdout) 109\n - Up 2 hours
unstable-small-01.aws-eu-central-1a.nimbus.pyrmont | CHANGED | rc=0 | (stdout) 106\n - Up 2 hours
unstable-small-02.aws-eu-central-1a.nimbus.pyrmont | CHANGED | rc=0 | (stdout) 91\n - Up About an hour
unstable-small-03.aws-eu-central-1a.nimbus.pyrmont | CHANGED | rc=0 | (stdout) 126\n - Up 2 hours
unstable-small-04.aws-eu-central-1a.nimbus.pyrmont | CHANGED | rc=0 | (stdout) 147\n - Up 2 hours
unstable-libp2p-small-01.aws-eu-central-1a.nimbus.pyrmont | CHANGED | rc=0 | (stdout) 4146\n - Up 3 days

But not exactly, because the testing hosts have been upo for a while - same as libp2p, but no FIN-WAIT-2 counted.

jakubgs commented 3 years ago

There is an issue that suggests this might be related to Docker: https://github.com/docker/for-linux/issues/335

But truth be told I've never seen this issue anywhere else we use Docker, which is everywhere.

dryajov commented 3 years ago

Interesting, we don't timeout the shutdown, but I'm also not sure if chronos exposes an explicit close or if canceling the closeWait() call would result in the socket being closed? @cheatfate could you provide some insights into this?

cheatfate commented 3 years ago

If you search chronos repository for shutdown keyword you will find only one place where it was used - in test to simulate some particular error condition. So all the sockets are closed using close not shutdown. https://github.com/status-im/nim-chronos/blob/master/tests/teststream.nim#L647-L649

@dryajov when you perform close you launching 2 steps closing procedure, on first step we are notifying pending readers and writers about closure and on second step we actually perform removal of socket from system queue and closing socket. This second step is got scheduled using callSoon, so right now its impossible to cancel second step.

https://github.com/status-im/nim-chronos/blob/master/chronos/asyncloop.nim#L625-L659

If you perform cancel on closeWait() you will cancel only waiting task, so you will not be notified when 2nd step of closure procedure will be finished.

https://github.com/status-im/nim-chronos/blob/master/chronos/transports/stream.nim#L2288-L2291

As you can see here closeWait() returns Future[void] which is produced by join(), so you actually cancelling join() and not closure procedure itself.

In general this of course a problem, but chronos test suite perform sockets leaking tests at every run. Before starting heavy sockets test it acquires FD number https://github.com/status-im/nim-chronos/blob/master/tests/teststream.nim#L1203 and when all tests completed it acquires FD number again https://github.com/status-im/nim-chronos/blob/master/tests/teststream.nim#L1291 . So there no hidden socket leaks inside chronos itself.

dryajov commented 3 years ago

@cheatfate this isn't about leaking sockets on our end, this is about us performing a multistep close of the socket, which in BSD sockets is initiated with shutdown. It seems that chronos assumes shutdown and there is no analogue to BSD close. Thus the connection is left in an awaiting state (ie half-closed), because the other end never finishes the close procedure and we have no way of forcibly closing the connection as it is.

cheatfate commented 3 years ago

@dryajov i think you do not understand.

Please search chronos code base for keyword "shutdown" to confirm that chronos do not use shutdown() syscall for closing sockets, chronos only use close() syscall https://man7.org/linux/man-pages/man2/close.2.html

https://github.com/status-im/nim-chronos/search?q=shutdown

dryajov commented 3 years ago

OK, I see that now.

So, it seems like this might be an issue elsewhere, due to:

the FIN_WAIT2 state only happens when the initiator of the sequence calls shutdown, which chronos doesn't do, instead it always calls close
I can't observe the same behaviour on my node (neither does @cheatfate, AFAIU)

Some suspects might be either the firewall dropping/blocking packets or our docker setup. We'll keep investigating on our end, but it's worth check our firewall rules/logs for dropped FIN and ACK packets, as well as our docker logs.

It might also be faulty third party nodes deployed in pyrmont. Can we verify that our mainnet nodes aren't exhibiting the same behaviour?

jakubgs commented 3 years ago

Mainnet does not show this behavior:

 > ansible nimbus.mainnet -o -a 'ss -H -4 state FIN-WAIT-2 | wc -l'
stable-small-01.aws-eu-central-1a.nimbus.mainnet | CHANGED | rc=0 | (stdout) 0
stable-small-02.aws-eu-central-1a.nimbus.mainnet | CHANGED | rc=0 | (stdout) 0

At least not right now.

jakubgs commented 3 years ago

As far as I can tell this mostly affects the libp2p and unstable branches:

 > ansible nimbus.pyrmont -o -a 'ss -H -4 state FIN-WAIT-2 | wc -l' | sort -h
stable-large-01.aws-eu-central-1a.nimbus.pyrmont | CHANGED | rc=0 | (stdout) 0
stable-small-01.aws-eu-central-1a.nimbus.mainnet | CHANGED | rc=0 | (stdout) 0
stable-small-01.aws-eu-central-1a.nimbus.pyrmont | CHANGED | rc=0 | (stdout) 15
stable-small-02.aws-eu-central-1a.nimbus.mainnet | CHANGED | rc=0 | (stdout) 0
testing-large-01.aws-eu-central-1a.nimbus.pyrmont | CHANGED | rc=0 | (stdout) 0
testing-small-01.aws-eu-central-1a.nimbus.pyrmont | CHANGED | rc=0 | (stdout) 0
testing-small-02.aws-eu-central-1a.nimbus.pyrmont | CHANGED | rc=0 | (stdout) 0
testing-small-03.aws-eu-central-1a.nimbus.pyrmont | CHANGED | rc=0 | (stdout) 0
testing-small-04.aws-eu-central-1a.nimbus.pyrmont | CHANGED | rc=0 | (stdout) 0
unstable-large-01.aws-eu-central-1a.nimbus.pyrmont | CHANGED | rc=0 | (stdout) 380
unstable-large-02.aws-eu-central-1a.nimbus.pyrmont | CHANGED | rc=0 | (stdout) 580
unstable-small-01.aws-eu-central-1a.nimbus.pyrmont | CHANGED | rc=0 | (stdout) 1146
unstable-small-02.aws-eu-central-1a.nimbus.pyrmont | CHANGED | rc=0 | (stdout) 1035
unstable-small-03.aws-eu-central-1a.nimbus.pyrmont | CHANGED | rc=0 | (stdout) 849
unstable-small-04.aws-eu-central-1a.nimbus.pyrmont | CHANGED | rc=0 | (stdout) 689
unstable-libp2p-small-01.aws-eu-central-1a.nimbus.pyrmont | CHANGED | rc=0 | (stdout) 657

The unstable branch is up from ~100 connections stuck in FIN-WAIT-2 to ~500-1000 and libp2p is down because the container image was updated 14 hours ago.

jakubgs commented 3 years ago

Ad before, it seems unstable and libp2p are most affected:

stable-large-01.aws-eu-central-1a.nimbus.pyrmont | CHANGED | rc=0 | (stdout) 0
stable-small-01.aws-eu-central-1a.nimbus.pyrmont | CHANGED | rc=0 | (stdout) 21
testing-large-01.aws-eu-central-1a.nimbus.pyrmont | CHANGED | rc=0 | (stdout) 0
testing-small-01.aws-eu-central-1a.nimbus.pyrmont | CHANGED | rc=0 | (stdout) 5
testing-small-02.aws-eu-central-1a.nimbus.pyrmont | CHANGED | rc=0 | (stdout) 0
testing-small-03.aws-eu-central-1a.nimbus.pyrmont | CHANGED | rc=0 | (stdout) 0
testing-small-04.aws-eu-central-1a.nimbus.pyrmont | CHANGED | rc=0 | (stdout) 0
unstable-large-01.aws-eu-central-1a.nimbus.pyrmont | CHANGED | rc=0 | (stdout) 167
unstable-large-02.aws-eu-central-1a.nimbus.pyrmont | CHANGED | rc=0 | (stdout) 175
unstable-small-01.aws-eu-central-1a.nimbus.pyrmont | CHANGED | rc=0 | (stdout) 238
unstable-small-02.aws-eu-central-1a.nimbus.pyrmont | CHANGED | rc=0 | (stdout) 264
unstable-small-03.aws-eu-central-1a.nimbus.pyrmont | CHANGED | rc=0 | (stdout) 302
unstable-small-04.aws-eu-central-1a.nimbus.pyrmont | CHANGED | rc=0 | (stdout) 164
unstable-libp2p-small-01.aws-eu-central-1a.nimbus.pyrmont | CHANGED | rc=0 | (stdout) 2178

jakubgs commented 3 years ago

Actually, It doesn't seem like restart clears them up properly:

admin@unstable-libp2p-small-01.aws-eu-central-1a.nimbus.pyrmont:~ % d
CONTAINER ID   NAMES                              IMAGE                                        CREATED        STATUS
02abe550d64f   beacon-node-pyrmont-libp2p-small   statusteam/nimbus_beacon_node:libp2p-small   45 hours ago   Up 44 hours
admin@unstable-libp2p-small-01.aws-eu-central-1a.nimbus.pyrmont:~ % d restart beacon-node-pyrmont-libp2p-small
beacon-node-pyrmont-libp2p-small
admin@unstable-libp2p-small-01.aws-eu-central-1a.nimbus.pyrmont:~ % ss -H -4 state FIN-WAIT-2 | wc -l         
2176

So I rebooted the host, and how it's back to 0:

admin@unstable-libp2p-small-01.aws-eu-central-1a.nimbus.pyrmont:~ % ss -H -4 state FIN-WAIT-2 | wc -l
0

dryajov commented 3 years ago

We need to check the state of the sockets in the docker container itself, that will give us an idea of what/where this is originating.

jakubgs commented 3 years ago

It doesn't seem to happen within the container itself:

admin@unstable-libp2p-small-01.aws-eu-central-1a.nimbus.pyrmont:~ % sudo netstat -pnt | grep FIN_WAIT2 | wc -l                                                                                                     
1215

admin@unstable-libp2p-small-01.aws-eu-central-1a.nimbus.pyrmont:~ % d exec -it beacon-node-pyrmont-libp2p-small bash
root@e6076df0ff1c:/# netstat -pnt | grep FIN_WAIT2 | wc -l
0

dryajov commented 3 years ago

OK, that means that this is definitely not libp2p. Maybe we are hitting the docker bug?

jakubgs commented 3 years ago

But why would it happen only on the libp2p and unstable hosts in such a volume while hardly at all on others?

 > ansible nimbus.pyrmont -o -a 'ss -H -4 state FIN-WAIT-2 | wc -l' | sort -h
stable-large-01.aws-eu-central-1a.nimbus.pyrmont | CHANGED | rc=0 | (stdout) 0
stable-small-01.aws-eu-central-1a.nimbus.pyrmont | CHANGED | rc=0 | (stdout) 26
testing-large-01.aws-eu-central-1a.nimbus.pyrmont | CHANGED | rc=0 | (stdout) 0
testing-small-01.aws-eu-central-1a.nimbus.pyrmont | CHANGED | rc=0 | (stdout) 7
testing-small-02.aws-eu-central-1a.nimbus.pyrmont | CHANGED | rc=0 | (stdout) 0
testing-small-03.aws-eu-central-1a.nimbus.pyrmont | CHANGED | rc=0 | (stdout) 0
testing-small-04.aws-eu-central-1a.nimbus.pyrmont | CHANGED | rc=0 | (stdout) 0
unstable-large-01.aws-eu-central-1a.nimbus.pyrmont | CHANGED | rc=0 | (stdout) 486
unstable-large-02.aws-eu-central-1a.nimbus.pyrmont | CHANGED | rc=0 | (stdout) 570
unstable-small-01.aws-eu-central-1a.nimbus.pyrmont | CHANGED | rc=0 | (stdout) 99
unstable-small-02.aws-eu-central-1a.nimbus.pyrmont | CHANGED | rc=0 | (stdout) 131
unstable-small-03.aws-eu-central-1a.nimbus.pyrmont | CHANGED | rc=0 | (stdout) 96
unstable-small-04.aws-eu-central-1a.nimbus.pyrmont | CHANGED | rc=0 | (stdout) 702
unstable-libp2p-small-01.aws-eu-central-1a.nimbus.pyrmont | CHANGED | rc=0 | (stdout) 1222

The pattern appears to be pretty clear in terms of branches.

And if I check for other fleets like infra-hq and infra-misc I see none of those FIN-WAIT-2 socket states. Literally zero.

So it seems like it's some kind of combination of how Docker handles mapped ports and libp2p implementation.

jakubgs commented 3 years ago

I did find a few hosts from out status-go fleet that have just a few FIN-WAIT-2 state sockets:

mail-01.ac-cn-hongkong-c.eth.prod | CHANGED | rc=0 | (stdout) 13
mail-02.ac-cn-hongkong-c.eth.prod | CHANGED | rc=0 | (stdout) 14
mail-03.ac-cn-hongkong-c.eth.prod | CHANGED | rc=0 | (stdout) 14
node-02.ac-cn-hongkong-c.eth.prod | CHANGED | rc=0 | (stdout) 7
node-03.ac-cn-hongkong-c.eth.prod | CHANGED | rc=0 | (stdout) 1
node-04.ac-cn-hongkong-c.eth.prod | CHANGED | rc=0 | (stdout) 10
node-05.ac-cn-hongkong-c.eth.prod | CHANGED | rc=0 | (stdout) 2
node-06.ac-cn-hongkong-c.eth.prod | CHANGED | rc=0 | (stdout) 2
node-08.ac-cn-hongkong-c.eth.prod | CHANGED | rc=0 | (stdout) 3
mail-01.do-ams3.eth.prod | CHANGED | rc=0 | (stdout) 17
mail-02.do-ams3.eth.prod | CHANGED | rc=0 | (stdout) 20
mail-03.do-ams3.eth.prod | CHANGED | rc=0 | (stdout) 20
node-01.do-ams3.eth.prod | CHANGED | rc=0 | (stdout) 2
node-02.do-ams3.eth.prod | CHANGED | rc=0 | (stdout) 1
node-03.do-ams3.eth.prod | CHANGED | rc=0 | (stdout) 7
node-04.do-ams3.eth.prod | CHANGED | rc=0 | (stdout) 1
node-06.do-ams3.eth.prod | CHANGED | rc=0 | (stdout) 1
node-07.do-ams3.eth.prod | CHANGED | rc=0 | (stdout) 1
node-08.do-ams3.eth.prod | CHANGED | rc=0 | (stdout) 2

And but that's devp2p, not libp2p.

dryajov commented 3 years ago

I agree that we can't completely rule out libp2p. But we can isolate docker a bit further - how about we run unstable without docker at all on one of the problematic machines and see what the result is. If FIN_2 is gone, I would say that this is probably the same or a similar bug reported here https://github.com/docker/for-linux/issues/335.

jakubgs commented 3 years ago

It's worth a try. I'll take a look tomorrow.

jakubgs commented 3 years ago

Okay, I created a Systemd service definition that makes use of the same resources as the docker container:

[Unit]
Description="Nimbus Beacon Node"
Documentation=https://github.com/status-im/nimbus-eth2
Requires=network-online.target
After=network-online.target

[Service]
User=dockremap
ExecStart=/usr/local/bin/nimbus_beacon_node \
  --network=pyrmont \
  --data-dir=/docker/beacon-node-pyrmont-libp2p-small/data/shared_pyrmont_0 \
  --web3-url=ws://10.4.0.165:8546 \
  --nat=extip:18.195.225.101 \
  --log-level=DEBUG \
  --tcp-port=9100 \
  --udp-port=9100 \
  --netkey-file=/docker/beacon-node-pyrmont-libp2p-small/data/netkey \
  --insecure-netkey-password=true \
  --rpc \
  --rpc-address=0.0.0.0 \
  --rpc-port=11000 \
  --metrics \
  --metrics-address=0.0.0.0 \
  --metrics-port=9300

Restart=on-failure
WorkingDirectory=/docker/beacon-node-pyrmont-libp2p-small/data
PermissionsStartOnly=True
LimitNOFILE=1048576

[Install]
WantedBy=multi-user.target

And copied the nimbus_beacon_node and nimbus_signing_process binaries directly from the container onto the host.

But it exploded with:

Traceback (most recent call last, using override)
/data/beacon-node-builds/libp2p-small/repo/vendor/nim-libp2p/libp2p/stream/bufferstream.nim(354) main
/data/beacon-node-builds/libp2p-small/repo/vendor/nim-libp2p/libp2p/stream/bufferstream.nim(347) NimMain
/data/beacon-node-builds/libp2p-small/repo/beacon_chain/nimbus_beacon_node.nim(1578) main
/data/beacon-node-builds/libp2p-small/repo/beacon_chain/nimbus_beacon_node.nim(1100) start
/data/beacon-node-builds/libp2p-small/repo/vendor/nim-chronicles/chronicles.nim(329) run
/data/beacon-node-builds/libp2p-small/repo/vendor/nimbus-build-system/vendor/Nim/lib/system/excpt.nim(407) reportUnhandledError
/data/beacon-node-builds/libp2p-small/repo/vendor/nimbus-build-system/vendor/Nim/lib/system/excpt.nim(358) reportUnhandledErrorAux
Error: unhandled exception: /data/beacon-node-builds/libp2p-small/repo/beacon_chain/validator_duties.nim(620, 12) `node.config.gossipSlashingProtection == GossipSlashingProtectionMode.dontcheck or
    (node.processor[].gossipSlashingProtection.probeEpoch <
    node.processor[].gossipSlashingProtection.broadcastStartEpoch)`  [AssertionError]
,{"hcode":50,"key":50,"val":541088},{"hcode":51,"key":51,"val":541088},{"hcode":52,"key":52,"val":541088},{"hcode":46,"key":46,"val":541088},{"hcode":54,"key":54,"val":541088},{"hcode":55,"key":55,"val":541088},{"hcode":56,"key":56,"val":541088},{"hcode":57,"key":57,"val":541088},{"hcode":53,"key":53,"val":541088},{"hcode":59,"key":59,"val":541088},{"hcode":60,"key":60,"val":541088},{"hcode":61,"key":61,"val":541088},{"hcode":62,"key":62,"val":541088},{"hcode":63,"key":63,"val":541088},{"hcode":58,"key":58,"val":541088},{"hcode":0,"key":0,"val":0},{"hcode":0,"key":0,"val":0},{"hcode":0,"key":0,"val":0},{"hcode":0,"key":0,"val":0},{"hcode":0,"key":0,"val":0},{"hcode":0,"key":0,"val":0},{"hcode":0,"key":0,"val":0},{"hcode":0,"key":0,"val":0},{"hcode":0,"key":0,"val":0},{"hcode":0,"key":0,"val":0},{"hcode":0,"key":0,"val":0},{"hcode":0,"key":0,"val":0},{"hcode":0,"key":0,"val":0},{"hcode":0,"key":0,"val":0},{"hcode":0,"key":0,"val":0},{"hcode":0,"key":0,"val":0},{"hcode":0,"key":0,"val":0},{"hcode":0,"key":0,"val":0},{"hcode":0,"key":0,"val":0},{"hcode":0,"key":0,"val":0},{"hcode":0,"key":0,"val":0},{"hcode":0,"key":0,"val":0},{"hcode":0,"key":0,"val":0},{"hcode":0,"key":0,"val":0},{"hcode":0,"key":0,"val":0},{"hcode":0,"key":0,"val":0},{"hcode":0,"key":0,"val":0},{"hcode":0,"key":0,"val":0},{"hcode":0,"key":0,"val":0},{"hcode":0,"key":0,"val":0},{"hcode":0,"key":0,"val":0},{"hcode":0,"key":0,"val":0},{"hcode":0,"key":0,"val":0},{"hcode":0,"key":0,"val":0},{"hcode":0,"key":0,"val":0},{"hcode":0,"key":0,"val":0},{"hcode":0,"key":0,"val":0},{"hcode":0,"key":0,"val":0},{"hcode":0,"key":0,"val":0},{"hcode":0,"key":0,"val":0},{"hcode":0,"key":0,"val":0},{"hcode":0,"key":0,"val":0},{"hcode":0,"key":0,"val":0},{"hcode":0,"key":0,"val":0},{"hcode":0,"key":0,"val":0},{"hcode":0,"key":0,"val":0},{"hcode":0,"key":0,"val":0},{"hcode":0,"key":0,"val":0},{"hcode":0,"key":0,"val":0},{"hcode":0,"key":0,"val":0},{"hcode":0,"key":0,"val":0},{"hcode":0,"key":0,"val":0},{"hcode":0,"key":0,"val":0},{"hcode":0,"key":0,"val":0},{"hcode":0,"key":0,"val":0},{"hcode":0,"key":0,"val":0},{"hcode":0,"key":0,"val":0},{"hcode":0,"key":0,"val":0},{"hcode":0,"key":0,"val":0},{"hcode":0,"key":0,"val":0},{"hcode":0,"key":0,"val":0},{"hcode":0,"key":0,"val":0},{"hcode":0,"key":0,"val":0}],"counter":64},"initialStabilitySubnets":[0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63],"wallEpoch":16907}
{"lvl":"DBG","ts":"2021-02-01 15:29:55.284+00:00","msg":"Attestation subnets","topics":"beacnde","tid":11495,"file":"nimbus_beacon_node.nim:517","expiringSubnets":[],"subnets":[0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63],"newSubnets":[],"wallSlot":541049,"wallEpoch":16907,"num_stability_subnets":2000,"expiring_stability_subnets":[],"new_stability_subnets":[],"subscribedSubnets":[],"unsubscribedSubnets":[]}

jakubgs commented 3 years ago

The dockremap user definitely has access to the slashing protection DB:

root@unstable-libp2p-small-01:~# ls -l /data/beacon-node-pyrmont-libp2p-small/data/shared_pyrmont_0/validators/slashing_protection.*
-rw-r--r-- 1 dockremap dockremap 608833536 Feb  1 15:28 /data/beacon-node-pyrmont-libp2p-small/data/shared_pyrmont_0/validators/slashing_protection.sqlite3
-rw-r--r-- 1 dockremap dockremap     32768 Feb  1 15:32 /data/beacon-node-pyrmont-libp2p-small/data/shared_pyrmont_0/validators/slashing_protection.sqlite3-shm
-rw-r--r-- 1 dockremap dockremap     32992 Feb  1 15:32 /data/beacon-node-pyrmont-libp2p-small/data/shared_pyrmont_0/validators/slashing_protection.sqlite3-wal

jakubgs commented 3 years ago

It appears the libp2p branch is outdated: https://github.com/status-im/nimbus-eth2/commits/nim-libp2p-auto-bump

But that's unrelated since I've also built the binary directly on the host instead of copying it from the container and it works:

NOT 2021-02-01 17:23:25.356+00:00 Attestation sent                           topics="beacval" tid=32613 file=validator_duties.nim:220 attestation="(aggregation_bits: 0b0000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000, data: (slot: 541616, index: 27, beacon_block_root: \"8720de59\", source: (epoch: 16924, root: \"aa2f6c57\"), target: (epoch: 16925, root: \"8f940a0c\")), signature: \"a5dc459e\")" validator=b9d94838 delay=2s356ms701us683ns indexInCommittee=58

jakubgs commented 3 years ago

As suggested by @dryajov I've rebased nim-libp2p-auto-bump on unstable and rebuilt it. Now it's running so far, lets see what happens overnight.

jakubgs commented 3 years ago

Working fine so far:

admin@unstable-libp2p-small-01.aws-eu-central-1a.nimbus.pyrmont:~/soft/nimbus-eth2 % sudo systemctl status beacon-node-pyrmont-libp2p-small
● beacon-node-pyrmont-libp2p-small.service - "Nimbus Beacon Node"
     Loaded: loaded (/etc/systemd/system/beacon-node-pyrmont-libp2p-small.service; disabled; vendor preset: enabled)
     Active: active (running) since Mon 2021-02-01 18:16:23 UTC; 59min ago
       Docs: https://github.com/status-im/nimbus-eth2
   Main PID: 40025 (nimbus_beacon_n)
      Tasks: 2 (limit: 4652)
     Memory: 1.7G
     CGroup: /system.slice/beacon-node-pyrmont-libp2p-small.service
             └─40025 /usr/local/bin/nimbus_beacon_node --network=pyrmont --data-dir=/docker/beacon-node-pyrmont-libp2p-small/data/shared_pyrmont_0 ...

jakubgs commented 3 years ago

Host shows just one connection like that on the libp2p host:

admin@unstable-libp2p-small-01.aws-eu-central-1a.nimbus.pyrmont:~ % ss -H -4 state FIN-WAIT-2 | wc -l
1

jakubgs commented 3 years ago

Also, it looks like testing is showing high numbers now:

 > ansible nimbus.pyrmont -o -a 'ss -H -4 state FIN-WAIT-2 | wc -l' | sort -h
stable-large-01.aws-eu-central-1a.nimbus.pyrmont | CHANGED | rc=0 | (stdout) 0
stable-small-01.aws-eu-central-1a.nimbus.pyrmont | CHANGED | rc=0 | (stdout) 234
testing-large-01.aws-eu-central-1a.nimbus.pyrmont | CHANGED | rc=0 | (stdout) 4473
testing-small-01.aws-eu-central-1a.nimbus.pyrmont | CHANGED | rc=0 | (stdout) 1604
testing-small-02.aws-eu-central-1a.nimbus.pyrmont | CHANGED | rc=0 | (stdout) 2554
testing-small-03.aws-eu-central-1a.nimbus.pyrmont | CHANGED | rc=0 | (stdout) 4630
testing-small-04.aws-eu-central-1a.nimbus.pyrmont | CHANGED | rc=0 | (stdout) 4805
unstable-large-01.aws-eu-central-1a.nimbus.pyrmont | CHANGED | rc=0 | (stdout) 92
unstable-large-02.aws-eu-central-1a.nimbus.pyrmont | CHANGED | rc=0 | (stdout) 86
unstable-small-01.aws-eu-central-1a.nimbus.pyrmont | CHANGED | rc=0 | (stdout) 0
unstable-small-02.aws-eu-central-1a.nimbus.pyrmont | CHANGED | rc=0 | (stdout) 0
unstable-small-03.aws-eu-central-1a.nimbus.pyrmont | CHANGED | rc=0 | (stdout) 4
unstable-small-04.aws-eu-central-1a.nimbus.pyrmont | CHANGED | rc=0 | (stdout) 83
unstable-libp2p-small-01.aws-eu-central-1a.nimbus.pyrmont | CHANGED | rc=0 | (stdout) 1

Although my question would be: Why do we consider this an issue?

dryajov commented 3 years ago

Host shows just one connection like that on the libp2p host:
admin@unstable-libp2p-small-01.aws-eu-central-1a.nimbus.pyrmont:~ % ss -H -4 state FIN-WAIT-2 | wc -l
1

Well, if this persists it might indicate the same issue, but if it goes away we should be fine.

jakubgs commented 3 years ago

It seems like that one was a leftover from Docker considering the IP:

admin@unstable-libp2p-small-01.aws-eu-central-1a.nimbus.pyrmont:~ % ss -H -4 state FIN-WAIT-2
tcp          0           0              172.20.1.210:35160                18.193.2.74:9100

I've re-built the image with nim-libp2p-auto-bump rebased on unstable. I'm gonna reboot the host to clear any remaining FIN-WAIT-2 and see what happens then.

dryajov commented 3 years ago

Although my question would be: Why do we consider this an issue?

Mostly, because this port pair is never released back to the OS and the socket is also leaking at this point. After a while, this becomes an issues because of both FD exhaustion, memory issues associated with it, but more importantly the port pair cannot be used anymore.

But there is a timeout on that, which should eventually clear those out, it is apparently set to 60s, why aren't they being cleared up by the OS? This is what control the timeout net.ipv4.tcp_fin_timeout (as documented here https://access.redhat.com/solutions/41776).

jakubgs commented 3 years ago

Yeah, I checked net.ipv4.tcp_fin_timeout earlier, and the value is no different from default on our hosts:

admin@unstable-libp2p-small-01.aws-eu-central-1a.nimbus.pyrmont:~ % cat /proc/sys/net/ipv4/tcp_fin_timeout
60

So they should be expiring.

jakubgs commented 3 years ago

The system service has been running for 16 hours and I see one FIN-WAIT-2 connection:

admin@unstable-libp2p-small-01.aws-eu-central-1a.nimbus.pyrmont:~ % ss -H -4 state FIN-WAIT-2
tcp         0        0           127.0.0.1:44190                   127.0.0.1:9100

jakubgs commented 3 years ago

If we compare https://github.com/status-im/infra-nimbus/issues/35#issuecomment-771665354 and https://github.com/status-im/infra-nimbus/issues/35#issuecomment-772515524 we can see that in one case the IPs shown are Docker related, and in the other they are both localhost.

dryajov commented 3 years ago

Interesting, what's the localhost port used for - is it still NBC? Otherwise, it seems like docker is the main culprit so far?

jakubgs commented 3 years ago

Nimbus listens on all addresses, including localhost:

admin@unstable-libp2p-small-01.aws-eu-central-1a.nimbus.pyrmont:~ % sudo netstat -lpnt | grep nimbus
tcp        0      0 0.0.0.0:11000           0.0.0.0:*               LISTEN      3902/nimbus_beacon_ 
tcp       15      0 0.0.0.0:9100            0.0.0.0:*               LISTEN      3902/nimbus_beacon_ 
tcp        0      0 0.0.0.0:9300            0.0.0.0:*               LISTEN      3902/nimbus_beacon_

dryajov commented 3 years ago

OK, seems like it is NBC. Can we check the command line nbc was run with, because this can be either the metrics port or the RPC port.

jakubgs commented 3 years ago

It's the libp2p port, config was already posted in https://github.com/status-im/infra-nimbus/issues/35#issuecomment-770942634.

jakubgs commented 3 years ago

Btw, you do have SSH access to this host: https://github.com/status-im/infra-nimbus/blob/b7226818d3977712db42dd6fa3d660f822a09f03/ansible/group_vars/all.yml#L35

jakubgs commented 3 years ago

Ah, but it's gone now:

admin@unstable-libp2p-small-01.aws-eu-central-1a.nimbus.pyrmont:~ % ss -H -4 state FIN-WAIT-2                             
admin@unstable-libp2p-small-01.aws-eu-central-1a.nimbus.pyrmont:~ %

And the service was not restarted.

I might have to break out tcpdump or this.

jakubgs commented 3 years ago

It seems like it doesn't rise when it's a Systemd service. So it's clearly caused by a combination of how Docker maps ports and how libp2p works:

stable-large-01.aws-eu-central-1a.nimbus.pyrmont | CHANGED | rc=0 | (stdout) 0\n - Up 31 hours
stable-small-01.aws-eu-central-1a.nimbus.pyrmont | CHANGED | rc=0 | (stdout) 1662\n - Up 23 hours
testing-large-01.aws-eu-central-1a.nimbus.pyrmont | CHANGED | rc=0 | (stdout) 13077\n - Up 9 days
testing-small-01.aws-eu-central-1a.nimbus.pyrmont | CHANGED | rc=0 | (stdout) 485\n - Up 3 days
testing-small-02.aws-eu-central-1a.nimbus.pyrmont | CHANGED | rc=0 | (stdout) 433\n - Up 3 days
testing-small-03.aws-eu-central-1a.nimbus.pyrmont | CHANGED | rc=0 | (stdout) 7113\n - Up 3 days
testing-small-04.aws-eu-central-1a.nimbus.pyrmont | CHANGED | rc=0 | (stdout) 6556\n - Up 3 days
unstable-large-01.aws-eu-central-1a.nimbus.pyrmont | CHANGED | rc=0 | (stdout) 6609\n - Up 4 days
unstable-large-02.aws-eu-central-1a.nimbus.pyrmont | CHANGED | rc=0 | (stdout) 6888\n - Up 4 days
unstable-small-01.aws-eu-central-1a.nimbus.pyrmont | CHANGED | rc=0 | (stdout) 0\n - Up 3 hours
unstable-small-02.aws-eu-central-1a.nimbus.pyrmont | CHANGED | rc=0 | (stdout) 1\n - Up 3 hours
unstable-small-03.aws-eu-central-1a.nimbus.pyrmont | CHANGED | rc=0 | (stdout) 4\n - Up 3 hours
unstable-small-04.aws-eu-central-1a.nimbus.pyrmont | CHANGED | rc=0 | (stdout) 140\n - Up 3 hours
unstable-libp2p-small-01.aws-eu-central-1a.nimbus.pyrmont | CHANGED | rc=0 | (stdout) 0\n -

dryajov commented 3 years ago

I think this is more to do with docker than systemd and libp2p. One more test to confirm this would be to run a previous version of docker, one that has been confirmed to not show this behaviour. The linked docker issue has a few listed, iirc.

jakubgs commented 3 years ago

I don't buy that explanation because as I said, no other services using Docker show this behavior.

status-im / infra-nimbus

Investigate large volume of connections in FIN_WAIT2 state #35