Closed jakubgs closed 3 years ago
I checked this after the new fleet layout was deployed and it appears like this issues is mostly affecting the libp2p
host:
> ansible nimbus.pyrmont -o -a 'sudo netstat -pnt | grep FIN_WAIT2 | wc -l'
unstable-large-01.aws-eu-central-1a.nimbus.pyrmont | CHANGED | rc=0 | (stdout) 0
unstable-large-02.aws-eu-central-1a.nimbus.pyrmont | CHANGED | rc=0 | (stdout) 72
stable-large-01.aws-eu-central-1a.nimbus.pyrmont | CHANGED | rc=0 | (stdout) 0
testing-large-01.aws-eu-central-1a.nimbus.pyrmont | CHANGED | rc=0 | (stdout) 0
testing-small-02.aws-eu-central-1a.nimbus.pyrmont | CHANGED | rc=0 | (stdout) 0
testing-small-04.aws-eu-central-1a.nimbus.pyrmont | CHANGED | rc=0 | (stdout) 0
unstable-small-02.aws-eu-central-1a.nimbus.pyrmont | CHANGED | rc=0 | (stdout) 4
testing-small-03.aws-eu-central-1a.nimbus.pyrmont | CHANGED | rc=0 | (stdout) 0
stable-small-01.aws-eu-central-1a.nimbus.pyrmont | CHANGED | rc=0 | (stdout) 5
unstable-small-01.aws-eu-central-1a.nimbus.pyrmont | CHANGED | rc=0 | (stdout) 29
unstable-small-04.aws-eu-central-1a.nimbus.pyrmont | CHANGED | rc=0 | (stdout) 44
unstable-small-03.aws-eu-central-1a.nimbus.pyrmont | CHANGED | rc=0 | (stdout) 49
testing-small-01.aws-eu-central-1a.nimbus.pyrmont | CHANGED | rc=0 | (stdout) 0
unstable-libp2p-small-01.aws-eu-central-1a.nimbus.pyrmont | CHANGED | rc=0 | (stdout) 4089
The number doesn't change much in a short span like 10 minutes. Goes up by one, goes down by one. Not much movement.
According to RFC-793:
FIN-WAIT-1
- represents waiting for a connection termination request from the remote TCP, or an acknowledgment of the connection termination request previously sent.FIN-WAIT-2
- represents waiting for a connection termination request from the remote TCP.
https://tools.ietf.org/html/rfc793
And according to the diagram on page 22 the FIN_WAIT_2
is reached after FIN_WAIT_1
when the first FIN
packet is sent one way, but the second and final one is never received successfully:
This issue is also described by IBM PI-73495:
When a secure connection is closed, there is a possibility that a TCB (connection structure) with FIN-WAIT-2 status will not be released and will stay around forever until TCPIP is restarted. The problem occurs when TCPIP has already sent a FIN packet to close out the connection, but it never receives the FIN packet back from the peer, which results in the connection never being fully closed.
If many sockets which were connected to a specific remote application end up stuck in this state, it usually indicates that the remote application either always dies unexpectedly when in the
CLOSE_WAIT
state or just fails to perform an active close after the passive close.The timeout for sockets in the
FIN-WAIT-2
state is defined with the parametertcp_fin_timeout
. You should set it to value high enough so that if the remote end-point is going to perform an active close, it will have time to do it. On the other hand sockets in this state do use some memory (even though not much) and this could lead to a memory overflow if too many sockets are stuck in this state for too long.
https://benohead.com/blog/2013/07/21/tcp-about-fin_wait_2-time_wait-and-close_wait/
tcp_fin_timeout
(integer; default: 60; since Linux 2.2) - This specifies how many seconds to wait for a final FIN packet before the socket is forcibly closed. This is strictly a violation of the TCP specification, but required to prevent denial-of-service attacks. In Linux 2.2, the default value was 180.
The kernel timeout only applies if the connection is orphaned. If the connection is still attached to a socket, the program that owns that socket is responsible for timing out the shutdown of the connection. Likely it has called
shutdown
and is waiting for the connection to shut down cleanly.
So it's likely that the code on libp2p
branch calls shutdown
on the socket but not close
.
The issue seems to affect only the 9100
libp2p port:
admin@unstable-libp2p-small-01.aws-eu-central-1a.nimbus.pyrmont:~ % sudo ss -H -4 state FIN-WAIT-2 | awk '{a[$5]++}END{for(x in a){printf "%5d - %s\n", a[x], x}}'
4109 - 172.17.2.2:9100
2 - 127.0.0.1:9100
Interesting article about a similar issue by CloudFlare: https://blog.cloudflare.com/this-is-strictly-a-violation-of-the-tcp-specification/
It appears that the issue affects mostly libp2p
but also unstable
but stable
and testing
are not affected currently.
Although, to be fair, if we compare the times since last restart they do kinda match up with number of FIN-WAIT-2
connections:
> ansible nimbus.pyrmont -o -a 'ss -H -4 state FIN-WAIT-2 | wc -l; echo -n " - "; docker ps --format="{%raw%}{{.Status}}{%endraw%}"' | sort
stable-large-01.aws-eu-central-1a.nimbus.pyrmont | CHANGED | rc=0 | (stdout) 0\n - Up 55 minutes
stable-small-01.aws-eu-central-1a.nimbus.pyrmont | CHANGED | rc=0 | (stdout) 0\n - Up 3 minutes
testing-large-01.aws-eu-central-1a.nimbus.pyrmont | CHANGED | rc=0 | (stdout) 0\n - Up 26 hours
testing-small-01.aws-eu-central-1a.nimbus.pyrmont | CHANGED | rc=0 | (stdout) 0\n - Up 3 hours
testing-small-02.aws-eu-central-1a.nimbus.pyrmont | CHANGED | rc=0 | (stdout) 0\n - Up 3 days
testing-small-03.aws-eu-central-1a.nimbus.pyrmont | CHANGED | rc=0 | (stdout) 0\n - Up 3 days
testing-small-04.aws-eu-central-1a.nimbus.pyrmont | CHANGED | rc=0 | (stdout) 0\n - Up 3 days
unstable-large-01.aws-eu-central-1a.nimbus.pyrmont | CHANGED | rc=0 | (stdout) 0\n - Up 4 minutes
unstable-large-02.aws-eu-central-1a.nimbus.pyrmont | CHANGED | rc=0 | (stdout) 109\n - Up 2 hours
unstable-small-01.aws-eu-central-1a.nimbus.pyrmont | CHANGED | rc=0 | (stdout) 106\n - Up 2 hours
unstable-small-02.aws-eu-central-1a.nimbus.pyrmont | CHANGED | rc=0 | (stdout) 91\n - Up About an hour
unstable-small-03.aws-eu-central-1a.nimbus.pyrmont | CHANGED | rc=0 | (stdout) 126\n - Up 2 hours
unstable-small-04.aws-eu-central-1a.nimbus.pyrmont | CHANGED | rc=0 | (stdout) 147\n - Up 2 hours
unstable-libp2p-small-01.aws-eu-central-1a.nimbus.pyrmont | CHANGED | rc=0 | (stdout) 4146\n - Up 3 days
But not exactly, because the testing
hosts have been upo for a while - same as libp2p
, but no FIN-WAIT-2
counted.
There is an issue that suggests this might be related to Docker: https://github.com/docker/for-linux/issues/335
But truth be told I've never seen this issue anywhere else we use Docker, which is everywhere.
Interesting, we don't timeout the shutdown, but I'm also not sure if chronos exposes an explicit close
or if canceling the closeWait()
call would result in the socket being closed? @cheatfate could you provide some insights into this?
If you search chronos
repository for shutdown
keyword you will find only one place where it was used - in test to simulate some particular error condition. So all the sockets are closed using close
not shutdown
.
https://github.com/status-im/nim-chronos/blob/master/tests/teststream.nim#L647-L649
@dryajov when you perform close
you launching 2 steps closing procedure, on first step we are notifying pending readers and writers about closure and on second step we actually perform removal of socket from system queue and closing socket. This second step is got scheduled using callSoon
, so right now its impossible to cancel second step.
https://github.com/status-im/nim-chronos/blob/master/chronos/asyncloop.nim#L625-L659
If you perform cancel on closeWait()
you will cancel only waiting task, so you will not be notified when 2nd step of closure procedure will be finished.
https://github.com/status-im/nim-chronos/blob/master/chronos/transports/stream.nim#L2288-L2291
As you can see here closeWait()
returns Future[void]
which is produced by join()
, so you actually cancelling join()
and not closure procedure itself.
In general this of course a problem, but chronos
test suite perform sockets leaking tests at every run.
Before starting heavy sockets test it acquires FD number https://github.com/status-im/nim-chronos/blob/master/tests/teststream.nim#L1203 and when all tests completed it acquires FD number again https://github.com/status-im/nim-chronos/blob/master/tests/teststream.nim#L1291 . So there no hidden socket leaks inside chronos
itself.
@cheatfate this isn't about leaking sockets on our end, this is about us performing a multistep close of the socket, which in BSD sockets is initiated with shutdown
. It seems that chronos assumes shutdown
and there is no analogue to BSD close
. Thus the connection is left in an awaiting state (ie half-closed), because the other end never finishes the close procedure and we have no way of forcibly closing the connection as it is.
@dryajov i think you do not understand.
Please search chronos
code base for keyword "shutdown" to confirm that chronos
do not use shutdown()
syscall for closing sockets, chronos
only use close()
syscall https://man7.org/linux/man-pages/man2/close.2.html
OK, I see that now.
So, it seems like this might be an issue elsewhere, due to:
FIN_WAIT2
state only happens when the initiator of the sequence calls shutdown
, which chronos doesn't do, instead it always calls close
Some suspects might be either the firewall dropping/blocking packets or our docker setup. We'll keep investigating on our end, but it's worth check our firewall rules/logs for dropped FIN
and ACK
packets, as well as our docker logs.
It might also be faulty third party nodes deployed in pyrmont. Can we verify that our mainnet nodes aren't exhibiting the same behaviour?
Mainnet does not show this behavior:
> ansible nimbus.mainnet -o -a 'ss -H -4 state FIN-WAIT-2 | wc -l'
stable-small-01.aws-eu-central-1a.nimbus.mainnet | CHANGED | rc=0 | (stdout) 0
stable-small-02.aws-eu-central-1a.nimbus.mainnet | CHANGED | rc=0 | (stdout) 0
At least not right now.
As far as I can tell this mostly affects the libp2p
and unstable
branches:
> ansible nimbus.pyrmont -o -a 'ss -H -4 state FIN-WAIT-2 | wc -l' | sort -h
stable-large-01.aws-eu-central-1a.nimbus.pyrmont | CHANGED | rc=0 | (stdout) 0
stable-small-01.aws-eu-central-1a.nimbus.mainnet | CHANGED | rc=0 | (stdout) 0
stable-small-01.aws-eu-central-1a.nimbus.pyrmont | CHANGED | rc=0 | (stdout) 15
stable-small-02.aws-eu-central-1a.nimbus.mainnet | CHANGED | rc=0 | (stdout) 0
testing-large-01.aws-eu-central-1a.nimbus.pyrmont | CHANGED | rc=0 | (stdout) 0
testing-small-01.aws-eu-central-1a.nimbus.pyrmont | CHANGED | rc=0 | (stdout) 0
testing-small-02.aws-eu-central-1a.nimbus.pyrmont | CHANGED | rc=0 | (stdout) 0
testing-small-03.aws-eu-central-1a.nimbus.pyrmont | CHANGED | rc=0 | (stdout) 0
testing-small-04.aws-eu-central-1a.nimbus.pyrmont | CHANGED | rc=0 | (stdout) 0
unstable-large-01.aws-eu-central-1a.nimbus.pyrmont | CHANGED | rc=0 | (stdout) 380
unstable-large-02.aws-eu-central-1a.nimbus.pyrmont | CHANGED | rc=0 | (stdout) 580
unstable-small-01.aws-eu-central-1a.nimbus.pyrmont | CHANGED | rc=0 | (stdout) 1146
unstable-small-02.aws-eu-central-1a.nimbus.pyrmont | CHANGED | rc=0 | (stdout) 1035
unstable-small-03.aws-eu-central-1a.nimbus.pyrmont | CHANGED | rc=0 | (stdout) 849
unstable-small-04.aws-eu-central-1a.nimbus.pyrmont | CHANGED | rc=0 | (stdout) 689
unstable-libp2p-small-01.aws-eu-central-1a.nimbus.pyrmont | CHANGED | rc=0 | (stdout) 657
The unstable
branch is up from ~100 connections stuck in FIN-WAIT-2
to ~500-1000 and libp2p
is down because the container image was updated 14 hours ago.
Ad before, it seems unstable
and libp2p
are most affected:
stable-large-01.aws-eu-central-1a.nimbus.pyrmont | CHANGED | rc=0 | (stdout) 0
stable-small-01.aws-eu-central-1a.nimbus.pyrmont | CHANGED | rc=0 | (stdout) 21
testing-large-01.aws-eu-central-1a.nimbus.pyrmont | CHANGED | rc=0 | (stdout) 0
testing-small-01.aws-eu-central-1a.nimbus.pyrmont | CHANGED | rc=0 | (stdout) 5
testing-small-02.aws-eu-central-1a.nimbus.pyrmont | CHANGED | rc=0 | (stdout) 0
testing-small-03.aws-eu-central-1a.nimbus.pyrmont | CHANGED | rc=0 | (stdout) 0
testing-small-04.aws-eu-central-1a.nimbus.pyrmont | CHANGED | rc=0 | (stdout) 0
unstable-large-01.aws-eu-central-1a.nimbus.pyrmont | CHANGED | rc=0 | (stdout) 167
unstable-large-02.aws-eu-central-1a.nimbus.pyrmont | CHANGED | rc=0 | (stdout) 175
unstable-small-01.aws-eu-central-1a.nimbus.pyrmont | CHANGED | rc=0 | (stdout) 238
unstable-small-02.aws-eu-central-1a.nimbus.pyrmont | CHANGED | rc=0 | (stdout) 264
unstable-small-03.aws-eu-central-1a.nimbus.pyrmont | CHANGED | rc=0 | (stdout) 302
unstable-small-04.aws-eu-central-1a.nimbus.pyrmont | CHANGED | rc=0 | (stdout) 164
unstable-libp2p-small-01.aws-eu-central-1a.nimbus.pyrmont | CHANGED | rc=0 | (stdout) 2178
Actually, It doesn't seem like restart clears them up properly:
admin@unstable-libp2p-small-01.aws-eu-central-1a.nimbus.pyrmont:~ % d
CONTAINER ID NAMES IMAGE CREATED STATUS
02abe550d64f beacon-node-pyrmont-libp2p-small statusteam/nimbus_beacon_node:libp2p-small 45 hours ago Up 44 hours
admin@unstable-libp2p-small-01.aws-eu-central-1a.nimbus.pyrmont:~ % d restart beacon-node-pyrmont-libp2p-small
beacon-node-pyrmont-libp2p-small
admin@unstable-libp2p-small-01.aws-eu-central-1a.nimbus.pyrmont:~ % ss -H -4 state FIN-WAIT-2 | wc -l
2176
So I rebooted the host, and how it's back to 0:
admin@unstable-libp2p-small-01.aws-eu-central-1a.nimbus.pyrmont:~ % ss -H -4 state FIN-WAIT-2 | wc -l
0
We need to check the state of the sockets in the docker container itself, that will give us an idea of what/where this is originating.
It doesn't seem to happen within the container itself:
admin@unstable-libp2p-small-01.aws-eu-central-1a.nimbus.pyrmont:~ % sudo netstat -pnt | grep FIN_WAIT2 | wc -l
1215
admin@unstable-libp2p-small-01.aws-eu-central-1a.nimbus.pyrmont:~ % d exec -it beacon-node-pyrmont-libp2p-small bash
root@e6076df0ff1c:/# netstat -pnt | grep FIN_WAIT2 | wc -l
0
OK, that means that this is definitely not libp2p. Maybe we are hitting the docker bug?
But why would it happen only on the libp2p
and unstable
hosts in such a volume while hardly at all on others?
> ansible nimbus.pyrmont -o -a 'ss -H -4 state FIN-WAIT-2 | wc -l' | sort -h
stable-large-01.aws-eu-central-1a.nimbus.pyrmont | CHANGED | rc=0 | (stdout) 0
stable-small-01.aws-eu-central-1a.nimbus.pyrmont | CHANGED | rc=0 | (stdout) 26
testing-large-01.aws-eu-central-1a.nimbus.pyrmont | CHANGED | rc=0 | (stdout) 0
testing-small-01.aws-eu-central-1a.nimbus.pyrmont | CHANGED | rc=0 | (stdout) 7
testing-small-02.aws-eu-central-1a.nimbus.pyrmont | CHANGED | rc=0 | (stdout) 0
testing-small-03.aws-eu-central-1a.nimbus.pyrmont | CHANGED | rc=0 | (stdout) 0
testing-small-04.aws-eu-central-1a.nimbus.pyrmont | CHANGED | rc=0 | (stdout) 0
unstable-large-01.aws-eu-central-1a.nimbus.pyrmont | CHANGED | rc=0 | (stdout) 486
unstable-large-02.aws-eu-central-1a.nimbus.pyrmont | CHANGED | rc=0 | (stdout) 570
unstable-small-01.aws-eu-central-1a.nimbus.pyrmont | CHANGED | rc=0 | (stdout) 99
unstable-small-02.aws-eu-central-1a.nimbus.pyrmont | CHANGED | rc=0 | (stdout) 131
unstable-small-03.aws-eu-central-1a.nimbus.pyrmont | CHANGED | rc=0 | (stdout) 96
unstable-small-04.aws-eu-central-1a.nimbus.pyrmont | CHANGED | rc=0 | (stdout) 702
unstable-libp2p-small-01.aws-eu-central-1a.nimbus.pyrmont | CHANGED | rc=0 | (stdout) 1222
The pattern appears to be pretty clear in terms of branches.
And if I check for other fleets like infra-hq
and infra-misc
I see none of those FIN-WAIT-2
socket states. Literally zero.
So it seems like it's some kind of combination of how Docker handles mapped ports and libp2p implementation.
I did find a few hosts from out status-go
fleet that have just a few FIN-WAIT-2
state sockets:
mail-01.ac-cn-hongkong-c.eth.prod | CHANGED | rc=0 | (stdout) 13
mail-02.ac-cn-hongkong-c.eth.prod | CHANGED | rc=0 | (stdout) 14
mail-03.ac-cn-hongkong-c.eth.prod | CHANGED | rc=0 | (stdout) 14
node-02.ac-cn-hongkong-c.eth.prod | CHANGED | rc=0 | (stdout) 7
node-03.ac-cn-hongkong-c.eth.prod | CHANGED | rc=0 | (stdout) 1
node-04.ac-cn-hongkong-c.eth.prod | CHANGED | rc=0 | (stdout) 10
node-05.ac-cn-hongkong-c.eth.prod | CHANGED | rc=0 | (stdout) 2
node-06.ac-cn-hongkong-c.eth.prod | CHANGED | rc=0 | (stdout) 2
node-08.ac-cn-hongkong-c.eth.prod | CHANGED | rc=0 | (stdout) 3
mail-01.do-ams3.eth.prod | CHANGED | rc=0 | (stdout) 17
mail-02.do-ams3.eth.prod | CHANGED | rc=0 | (stdout) 20
mail-03.do-ams3.eth.prod | CHANGED | rc=0 | (stdout) 20
node-01.do-ams3.eth.prod | CHANGED | rc=0 | (stdout) 2
node-02.do-ams3.eth.prod | CHANGED | rc=0 | (stdout) 1
node-03.do-ams3.eth.prod | CHANGED | rc=0 | (stdout) 7
node-04.do-ams3.eth.prod | CHANGED | rc=0 | (stdout) 1
node-06.do-ams3.eth.prod | CHANGED | rc=0 | (stdout) 1
node-07.do-ams3.eth.prod | CHANGED | rc=0 | (stdout) 1
node-08.do-ams3.eth.prod | CHANGED | rc=0 | (stdout) 2
And but that's devp2p, not libp2p.
I agree that we can't completely rule out libp2p. But we can isolate docker a bit further - how about we run unstable
without docker at all on one of the problematic machines and see what the result is. If FIN_2
is gone, I would say that this is probably the same or a similar bug reported here https://github.com/docker/for-linux/issues/335.
It's worth a try. I'll take a look tomorrow.
Okay, I created a Systemd service definition that makes use of the same resources as the docker container:
[Unit]
Description="Nimbus Beacon Node"
Documentation=https://github.com/status-im/nimbus-eth2
Requires=network-online.target
After=network-online.target
[Service]
User=dockremap
ExecStart=/usr/local/bin/nimbus_beacon_node \
--network=pyrmont \
--data-dir=/docker/beacon-node-pyrmont-libp2p-small/data/shared_pyrmont_0 \
--web3-url=ws://10.4.0.165:8546 \
--nat=extip:18.195.225.101 \
--log-level=DEBUG \
--tcp-port=9100 \
--udp-port=9100 \
--netkey-file=/docker/beacon-node-pyrmont-libp2p-small/data/netkey \
--insecure-netkey-password=true \
--rpc \
--rpc-address=0.0.0.0 \
--rpc-port=11000 \
--metrics \
--metrics-address=0.0.0.0 \
--metrics-port=9300
Restart=on-failure
WorkingDirectory=/docker/beacon-node-pyrmont-libp2p-small/data
PermissionsStartOnly=True
LimitNOFILE=1048576
[Install]
WantedBy=multi-user.target
And copied the nimbus_beacon_node
and nimbus_signing_process
binaries directly from the container onto the host.
But it exploded with:
Traceback (most recent call last, using override)
/data/beacon-node-builds/libp2p-small/repo/vendor/nim-libp2p/libp2p/stream/bufferstream.nim(354) main
/data/beacon-node-builds/libp2p-small/repo/vendor/nim-libp2p/libp2p/stream/bufferstream.nim(347) NimMain
/data/beacon-node-builds/libp2p-small/repo/beacon_chain/nimbus_beacon_node.nim(1578) main
/data/beacon-node-builds/libp2p-small/repo/beacon_chain/nimbus_beacon_node.nim(1100) start
/data/beacon-node-builds/libp2p-small/repo/vendor/nim-chronicles/chronicles.nim(329) run
/data/beacon-node-builds/libp2p-small/repo/vendor/nimbus-build-system/vendor/Nim/lib/system/excpt.nim(407) reportUnhandledError
/data/beacon-node-builds/libp2p-small/repo/vendor/nimbus-build-system/vendor/Nim/lib/system/excpt.nim(358) reportUnhandledErrorAux
Error: unhandled exception: /data/beacon-node-builds/libp2p-small/repo/beacon_chain/validator_duties.nim(620, 12) `node.config.gossipSlashingProtection == GossipSlashingProtectionMode.dontcheck or
(node.processor[].gossipSlashingProtection.probeEpoch <
node.processor[].gossipSlashingProtection.broadcastStartEpoch)` [AssertionError]
,{"hcode":50,"key":50,"val":541088},{"hcode":51,"key":51,"val":541088},{"hcode":52,"key":52,"val":541088},{"hcode":46,"key":46,"val":541088},{"hcode":54,"key":54,"val":541088},{"hcode":55,"key":55,"val":541088},{"hcode":56,"key":56,"val":541088},{"hcode":57,"key":57,"val":541088},{"hcode":53,"key":53,"val":541088},{"hcode":59,"key":59,"val":541088},{"hcode":60,"key":60,"val":541088},{"hcode":61,"key":61,"val":541088},{"hcode":62,"key":62,"val":541088},{"hcode":63,"key":63,"val":541088},{"hcode":58,"key":58,"val":541088},{"hcode":0,"key":0,"val":0},{"hcode":0,"key":0,"val":0},{"hcode":0,"key":0,"val":0},{"hcode":0,"key":0,"val":0},{"hcode":0,"key":0,"val":0},{"hcode":0,"key":0,"val":0},{"hcode":0,"key":0,"val":0},{"hcode":0,"key":0,"val":0},{"hcode":0,"key":0,"val":0},{"hcode":0,"key":0,"val":0},{"hcode":0,"key":0,"val":0},{"hcode":0,"key":0,"val":0},{"hcode":0,"key":0,"val":0},{"hcode":0,"key":0,"val":0},{"hcode":0,"key":0,"val":0},{"hcode":0,"key":0,"val":0},{"hcode":0,"key":0,"val":0},{"hcode":0,"key":0,"val":0},{"hcode":0,"key":0,"val":0},{"hcode":0,"key":0,"val":0},{"hcode":0,"key":0,"val":0},{"hcode":0,"key":0,"val":0},{"hcode":0,"key":0,"val":0},{"hcode":0,"key":0,"val":0},{"hcode":0,"key":0,"val":0},{"hcode":0,"key":0,"val":0},{"hcode":0,"key":0,"val":0},{"hcode":0,"key":0,"val":0},{"hcode":0,"key":0,"val":0},{"hcode":0,"key":0,"val":0},{"hcode":0,"key":0,"val":0},{"hcode":0,"key":0,"val":0},{"hcode":0,"key":0,"val":0},{"hcode":0,"key":0,"val":0},{"hcode":0,"key":0,"val":0},{"hcode":0,"key":0,"val":0},{"hcode":0,"key":0,"val":0},{"hcode":0,"key":0,"val":0},{"hcode":0,"key":0,"val":0},{"hcode":0,"key":0,"val":0},{"hcode":0,"key":0,"val":0},{"hcode":0,"key":0,"val":0},{"hcode":0,"key":0,"val":0},{"hcode":0,"key":0,"val":0},{"hcode":0,"key":0,"val":0},{"hcode":0,"key":0,"val":0},{"hcode":0,"key":0,"val":0},{"hcode":0,"key":0,"val":0},{"hcode":0,"key":0,"val":0},{"hcode":0,"key":0,"val":0},{"hcode":0,"key":0,"val":0},{"hcode":0,"key":0,"val":0},{"hcode":0,"key":0,"val":0},{"hcode":0,"key":0,"val":0},{"hcode":0,"key":0,"val":0},{"hcode":0,"key":0,"val":0},{"hcode":0,"key":0,"val":0},{"hcode":0,"key":0,"val":0},{"hcode":0,"key":0,"val":0},{"hcode":0,"key":0,"val":0},{"hcode":0,"key":0,"val":0},{"hcode":0,"key":0,"val":0},{"hcode":0,"key":0,"val":0}],"counter":64},"initialStabilitySubnets":[0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63],"wallEpoch":16907}
{"lvl":"DBG","ts":"2021-02-01 15:29:55.284+00:00","msg":"Attestation subnets","topics":"beacnde","tid":11495,"file":"nimbus_beacon_node.nim:517","expiringSubnets":[],"subnets":[0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63],"newSubnets":[],"wallSlot":541049,"wallEpoch":16907,"num_stability_subnets":2000,"expiring_stability_subnets":[],"new_stability_subnets":[],"subscribedSubnets":[],"unsubscribedSubnets":[]}
The dockremap
user definitely has access to the slashing protection DB:
root@unstable-libp2p-small-01:~# ls -l /data/beacon-node-pyrmont-libp2p-small/data/shared_pyrmont_0/validators/slashing_protection.*
-rw-r--r-- 1 dockremap dockremap 608833536 Feb 1 15:28 /data/beacon-node-pyrmont-libp2p-small/data/shared_pyrmont_0/validators/slashing_protection.sqlite3
-rw-r--r-- 1 dockremap dockremap 32768 Feb 1 15:32 /data/beacon-node-pyrmont-libp2p-small/data/shared_pyrmont_0/validators/slashing_protection.sqlite3-shm
-rw-r--r-- 1 dockremap dockremap 32992 Feb 1 15:32 /data/beacon-node-pyrmont-libp2p-small/data/shared_pyrmont_0/validators/slashing_protection.sqlite3-wal
It appears the libp2p branch is outdated: https://github.com/status-im/nimbus-eth2/commits/nim-libp2p-auto-bump
But that's unrelated since I've also built the binary directly on the host instead of copying it from the container and it works:
NOT 2021-02-01 17:23:25.356+00:00 Attestation sent topics="beacval" tid=32613 file=validator_duties.nim:220 attestation="(aggregation_bits: 0b0000000000000000000000000000000000000000000000000000000000000000000000010000000000000000000000000000000000000000000000000000000000, data: (slot: 541616, index: 27, beacon_block_root: \"8720de59\", source: (epoch: 16924, root: \"aa2f6c57\"), target: (epoch: 16925, root: \"8f940a0c\")), signature: \"a5dc459e\")" validator=b9d94838 delay=2s356ms701us683ns indexInCommittee=58
As suggested by @dryajov I've rebased nim-libp2p-auto-bump
on unstable
and rebuilt it. Now it's running so far, lets see what happens overnight.
Working fine so far:
admin@unstable-libp2p-small-01.aws-eu-central-1a.nimbus.pyrmont:~/soft/nimbus-eth2 % sudo systemctl status beacon-node-pyrmont-libp2p-small
● beacon-node-pyrmont-libp2p-small.service - "Nimbus Beacon Node"
Loaded: loaded (/etc/systemd/system/beacon-node-pyrmont-libp2p-small.service; disabled; vendor preset: enabled)
Active: active (running) since Mon 2021-02-01 18:16:23 UTC; 59min ago
Docs: https://github.com/status-im/nimbus-eth2
Main PID: 40025 (nimbus_beacon_n)
Tasks: 2 (limit: 4652)
Memory: 1.7G
CGroup: /system.slice/beacon-node-pyrmont-libp2p-small.service
└─40025 /usr/local/bin/nimbus_beacon_node --network=pyrmont --data-dir=/docker/beacon-node-pyrmont-libp2p-small/data/shared_pyrmont_0 ...
Host shows just one connection like that on the libp2p
host:
admin@unstable-libp2p-small-01.aws-eu-central-1a.nimbus.pyrmont:~ % ss -H -4 state FIN-WAIT-2 | wc -l
1
Also, it looks like testing
is showing high numbers now:
> ansible nimbus.pyrmont -o -a 'ss -H -4 state FIN-WAIT-2 | wc -l' | sort -h
stable-large-01.aws-eu-central-1a.nimbus.pyrmont | CHANGED | rc=0 | (stdout) 0
stable-small-01.aws-eu-central-1a.nimbus.pyrmont | CHANGED | rc=0 | (stdout) 234
testing-large-01.aws-eu-central-1a.nimbus.pyrmont | CHANGED | rc=0 | (stdout) 4473
testing-small-01.aws-eu-central-1a.nimbus.pyrmont | CHANGED | rc=0 | (stdout) 1604
testing-small-02.aws-eu-central-1a.nimbus.pyrmont | CHANGED | rc=0 | (stdout) 2554
testing-small-03.aws-eu-central-1a.nimbus.pyrmont | CHANGED | rc=0 | (stdout) 4630
testing-small-04.aws-eu-central-1a.nimbus.pyrmont | CHANGED | rc=0 | (stdout) 4805
unstable-large-01.aws-eu-central-1a.nimbus.pyrmont | CHANGED | rc=0 | (stdout) 92
unstable-large-02.aws-eu-central-1a.nimbus.pyrmont | CHANGED | rc=0 | (stdout) 86
unstable-small-01.aws-eu-central-1a.nimbus.pyrmont | CHANGED | rc=0 | (stdout) 0
unstable-small-02.aws-eu-central-1a.nimbus.pyrmont | CHANGED | rc=0 | (stdout) 0
unstable-small-03.aws-eu-central-1a.nimbus.pyrmont | CHANGED | rc=0 | (stdout) 4
unstable-small-04.aws-eu-central-1a.nimbus.pyrmont | CHANGED | rc=0 | (stdout) 83
unstable-libp2p-small-01.aws-eu-central-1a.nimbus.pyrmont | CHANGED | rc=0 | (stdout) 1
Although my question would be: Why do we consider this an issue?
Host shows just one connection like that on the
libp2p
host:admin@unstable-libp2p-small-01.aws-eu-central-1a.nimbus.pyrmont:~ % ss -H -4 state FIN-WAIT-2 | wc -l 1
Well, if this persists it might indicate the same issue, but if it goes away we should be fine.
It seems like that one was a leftover from Docker considering the IP:
admin@unstable-libp2p-small-01.aws-eu-central-1a.nimbus.pyrmont:~ % ss -H -4 state FIN-WAIT-2
tcp 0 0 172.20.1.210:35160 18.193.2.74:9100
I've re-built the image with nim-libp2p-auto-bump rebased on unstable
. I'm gonna reboot the host to clear any remaining FIN-WAIT-2
and see what happens then.
Although my question would be: Why do we consider this an issue?
Mostly, because this port pair is never released back to the OS and the socket is also leaking at this point. After a while, this becomes an issues because of both FD exhaustion, memory issues associated with it, but more importantly the port pair cannot be used anymore.
But there is a timeout on that, which should eventually clear those out, it is apparently set to 60s, why aren't they being cleared up by the OS? This is what control the timeout net.ipv4.tcp_fin_timeout
(as documented here https://access.redhat.com/solutions/41776).
Yeah, I checked net.ipv4.tcp_fin_timeout
earlier, and the value is no different from default on our hosts:
admin@unstable-libp2p-small-01.aws-eu-central-1a.nimbus.pyrmont:~ % cat /proc/sys/net/ipv4/tcp_fin_timeout
60
So they should be expiring.
The system service has been running for 16 hours and I see one FIN-WAIT-2
connection:
admin@unstable-libp2p-small-01.aws-eu-central-1a.nimbus.pyrmont:~ % ss -H -4 state FIN-WAIT-2
tcp 0 0 127.0.0.1:44190 127.0.0.1:9100
If we compare https://github.com/status-im/infra-nimbus/issues/35#issuecomment-771665354 and https://github.com/status-im/infra-nimbus/issues/35#issuecomment-772515524 we can see that in one case the IPs shown are Docker related, and in the other they are both localhost.
Interesting, what's the localhost port used for - is it still NBC? Otherwise, it seems like docker is the main culprit so far?
Nimbus listens on all addresses, including localhost:
admin@unstable-libp2p-small-01.aws-eu-central-1a.nimbus.pyrmont:~ % sudo netstat -lpnt | grep nimbus
tcp 0 0 0.0.0.0:11000 0.0.0.0:* LISTEN 3902/nimbus_beacon_
tcp 15 0 0.0.0.0:9100 0.0.0.0:* LISTEN 3902/nimbus_beacon_
tcp 0 0 0.0.0.0:9300 0.0.0.0:* LISTEN 3902/nimbus_beacon_
OK, seems like it is NBC. Can we check the command line nbc was run with, because this can be either the metrics port or the RPC port.
It's the libp2p port, config was already posted in https://github.com/status-im/infra-nimbus/issues/35#issuecomment-770942634.
Btw, you do have SSH access to this host: https://github.com/status-im/infra-nimbus/blob/b7226818d3977712db42dd6fa3d660f822a09f03/ansible/group_vars/all.yml#L35
Ah, but it's gone now:
admin@unstable-libp2p-small-01.aws-eu-central-1a.nimbus.pyrmont:~ % ss -H -4 state FIN-WAIT-2
admin@unstable-libp2p-small-01.aws-eu-central-1a.nimbus.pyrmont:~ %
And the service was not restarted.
I might have to break out tcpdump
or this.
It seems like it doesn't rise when it's a Systemd service. So it's clearly caused by a combination of how Docker maps ports and how libp2p works:
stable-large-01.aws-eu-central-1a.nimbus.pyrmont | CHANGED | rc=0 | (stdout) 0\n - Up 31 hours
stable-small-01.aws-eu-central-1a.nimbus.pyrmont | CHANGED | rc=0 | (stdout) 1662\n - Up 23 hours
testing-large-01.aws-eu-central-1a.nimbus.pyrmont | CHANGED | rc=0 | (stdout) 13077\n - Up 9 days
testing-small-01.aws-eu-central-1a.nimbus.pyrmont | CHANGED | rc=0 | (stdout) 485\n - Up 3 days
testing-small-02.aws-eu-central-1a.nimbus.pyrmont | CHANGED | rc=0 | (stdout) 433\n - Up 3 days
testing-small-03.aws-eu-central-1a.nimbus.pyrmont | CHANGED | rc=0 | (stdout) 7113\n - Up 3 days
testing-small-04.aws-eu-central-1a.nimbus.pyrmont | CHANGED | rc=0 | (stdout) 6556\n - Up 3 days
unstable-large-01.aws-eu-central-1a.nimbus.pyrmont | CHANGED | rc=0 | (stdout) 6609\n - Up 4 days
unstable-large-02.aws-eu-central-1a.nimbus.pyrmont | CHANGED | rc=0 | (stdout) 6888\n - Up 4 days
unstable-small-01.aws-eu-central-1a.nimbus.pyrmont | CHANGED | rc=0 | (stdout) 0\n - Up 3 hours
unstable-small-02.aws-eu-central-1a.nimbus.pyrmont | CHANGED | rc=0 | (stdout) 1\n - Up 3 hours
unstable-small-03.aws-eu-central-1a.nimbus.pyrmont | CHANGED | rc=0 | (stdout) 4\n - Up 3 hours
unstable-small-04.aws-eu-central-1a.nimbus.pyrmont | CHANGED | rc=0 | (stdout) 140\n - Up 3 hours
unstable-libp2p-small-01.aws-eu-central-1a.nimbus.pyrmont | CHANGED | rc=0 | (stdout) 0\n -
I think this is more to do with docker than systemd and libp2p. One more test to confirm this would be to run a previous version of docker, one that has been confirmed to not show this behaviour. The linked docker issue has a few listed, iirc.
I don't buy that explanation because as I said, no other services using Docker show this behavior.
We're seeing a lot of connections stuck in
FIN_WAIT2
state on Pyrmont fleet hosts.