Open SgtPooki opened 2 months ago
After running js-libp2p-amino-dht-bootstrapper over the weekend, starting with heapsnapshot->gc->heapsnapshot, then taking snapshots every 6 hours, and finishing this morning with heap->gc->heap, memlab gives the following result for analyze unbound-object
:
╰─ ✔ ❯ ls -lhatr snapshot-dir
total 4.7G
-rw-r--r-- 1 sgtpooki 0 May 10 19:00 .gitkeep
drwxr-xr-x 35 sgtpooki 1.1K May 10 19:07 ..
-rw------- 1 sgtpooki 37M May 10 19:20 2024-05-11T02:20:05.589Z.heapsnapshot
-rw------- 1 sgtpooki 42M May 10 19:20 2024-05-11T02:20:14.463Z.heapsnapshot
-rw------- 1 sgtpooki 129M May 10 19:22 2024-05-11T02:22:16.103Z.heapsnapshot
-rw------- 1 sgtpooki 120M May 11 00:00 2024-05-11T07:00:00.864Z.heapsnapshot
-rw------- 1 sgtpooki 197M May 11 06:00 2024-05-11T13:00:00.809Z.heapsnapshot
-rw------- 1 sgtpooki 314M May 11 12:00 2024-05-11T19:00:00.832Z.heapsnapshot
-rw------- 1 sgtpooki 337M May 11 18:00 2024-05-12T01:00:01.187Z.heapsnapshot
-rw------- 1 sgtpooki 391M May 12 00:00 2024-05-12T07:00:00.614Z.heapsnapshot
-rw------- 1 sgtpooki 483M May 12 06:00 2024-05-12T13:00:00.576Z.heapsnapshot
-rw------- 1 sgtpooki 438M May 12 12:00 2024-05-12T19:00:00.633Z.heapsnapshot
-rw------- 1 sgtpooki 468M May 12 18:00 2024-05-13T01:00:00.297Z.heapsnapshot
-rw------- 1 sgtpooki 497M May 13 00:00 2024-05-13T07:00:00.871Z.heapsnapshot
-rw------- 1 sgtpooki 466M May 13 06:00 2024-05-13T13:00:00.403Z.heapsnapshot
-rw------- 1 sgtpooki 473M May 13 07:50 2024-05-13T14:50:13.471Z.heapsnapshot
drwxr-xr-x 18 sgtpooki 576 May 13 07:51 .
-rw------- 1 sgtpooki 235M May 13 07:51 2024-05-13T14:51:12.831Z.heapsnapshot
╰─ ✘ INT ❯ npx -y memlab analyze unbound-object --snapshot-dir snapshot-dir
Top growing objects in sizes:
(Use `memlab trace --node-id=@ID` to get trace)
· system / Context [object](@387349): 3.3MB > 3.3MB > 3.3MB > 3.3MB > 3.3MB > 3.3MB > 3.3MB > 3.3MB > 3.3MB > 3.3MB > 3.3MB > 3.3MB > 3.3MB > 3.3MB > 3.3MB
· MplexStreamMuxer [object](@347089): 32KB > 47KB > 47.2KB > 49.2KB > 150.7KB > 168.4KB > 231.4KB > 318KB > 423KB > 510KB > 527.4KB > 562.1KB > 596.7KB > 614.2KB > 614.2KB
· Object { initiators, receivers } [object](@347195): 21.1KB > 38KB > 38KB > 31.3KB > 132.8KB > 150.4KB > 213.4KB > 300KB > 405.1KB > 492.1KB > 509.5KB > 544.2KB > 578.7KB > 596.2KB > 596.3KB
· Map [object](@347355): 20.7KB > 37.7KB > 37.7KB > 31KB > 132.6KB > 150.2KB > 213.2KB > 299.8KB > 404.9KB > 491.9KB > 509.3KB > 544KB > 578.5KB > 596KB > 596.1KB
· system / Context [object](@122957): 531.8KB > 532KB > 532.1KB > 532.1KB > 532.1KB > 532.1KB > 532.1KB > 532.1KB > 532.1KB > 532.1KB > 532.1KB > 532.1KB > 532.1KB > 532.1KB > 532.1KB
· ChaCha20Poly1305 [object](@122961): 530KB > 530.1KB > 530.1KB > 530.1KB > 530.1KB > 530.1KB > 530.1KB > 530.1KB > 530.1KB > 530.1KB > 530.1KB > 530.1KB > 530.1KB > 530.1KB > 530.1KB
· system / Context [object](@443697): 3.7KB > 3.7KB > 3.7KB > 352.1KB > 352.1KB > 352.1KB > 352.1KB > 352.1KB > 352.1KB > 352.1KB > 352.1KB > 352.1KB > 352.1KB > 352.1KB > 352.1KB
· Map [object](@709657): 184 bytes > 184 bytes > 184 bytes > 348.3KB > 348.3KB > 348.3KB > 348.3KB > 348.3KB > 348.3KB > 348.3KB > 348.3KB > 348.3KB > 348.3KB > 348.3KB > 348.3KB
· Providers [object](@382375): 5.9KB > 27.6KB > 89.9KB > 390KB > 362.5KB > 374.8KB > 278.4KB > 227.4KB > 214.7KB > 226.7KB > 282.6KB > 259.7KB > 275.1KB > 206.3KB > 217.6KB
· RoutingTable [object](@350995): 5.6KB > 52.6KB > 6.2KB > 6.6KB > 6.6KB > 6.6KB > 297.6KB > 274KB > 6.6KB > 6.6KB > 303.9KB > 275.4KB > 284.9KB > 235.8KB > 217.5KB
· Object { clear, get, has, remove, set } [object](@382403): 1.8KB > 23.3KB > 85.7KB > 385.5KB > 358.4KB > 370.3KB > 273.9KB > 222.9KB > 210.2KB > 222.2KB > 278KB > 255KB > 270.4KB > 201.7KB > 212.9KB
· KBucket [object](@351629): 33.3KB > 46.7KB > 54.3KB > 249.1KB > 254KB > 146.7KB > 291KB > 267.3KB > 262KB > 262KB > 297.2KB > 268.8KB > 278.3KB > 229.1KB > 210.9KB
· system / Context [object](@382739): 1.4KB > 22.2KB > 84.5KB > 384KB > 356.5KB > 368.3KB > 271.8KB > 220.7KB > 207.9KB > 220KB > 275.8KB > 252.8KB > 268.2KB > 199.4KB > 210.7KB
· Object { contacts, dontSplit, left, right } [object](@382903): 28.2KB > 41.2KB > 49KB > 243.2KB > 247.6KB > 141.1KB > 284.1KB > 260.6KB > 255.6KB > 255.6KB > 290.2KB > 261.9KB > 271.2KB > 222.5KB > 203.8KB
· Object { contacts, dontSplit, left, right } [object](@382943): 21.9KB > 34.6KB > 39KB > 205.9KB > 225.7KB > 132.5KB > 245.1KB > 223.2KB > 215.1KB > 206.4KB > 256.2KB > 221.8KB > 232.4KB > 194.8KB > 177.2KB
· AsyncGenerator [object](@59297): 33.8KB > 33.9KB > 113.3KB > 174.9KB > 174.9KB > 174.9KB > 174.9KB > 174.9KB > 174.9KB > 174.9KB > 174.9KB > 174.9KB > 174.9KB > 174.9KB > 174.9KB
· BuiltinModule [closure](@399391): 159.4KB > 163.9KB > 163.9KB > 163.9KB > 163.9KB > 163.9KB > 163.9KB > 163.9KB > 163.9KB > 159.4KB > 163.9KB > 163.9KB > 163.9KB > 159.4KB > 163.9KB
· Map [object](@433067): 158.1KB > 162.7KB > 162.7KB > 162.7KB > 162.7KB > 162.7KB > 162.7KB > 162.7KB > 162.7KB > 158.1KB > 162.7KB > 162.7KB > 162.7KB > 158.1KB > 162.7KB
· Object { contacts, dontSplit, left, right } [object](@272937): 17.3KB > 27.6KB > 27.6KB > 173.5KB > 208.6KB > 117.2KB > 214.7KB > 204.2KB > 190.8KB > 194.4KB > 218.4KB > 183.9KB > 191.7KB > 168.9KB > 150.5KB
· Object { contacts, dontSplit, left, right } [object](@272941): 5.3KB > 7.7KB > 22KB > 123.7KB > 153.8KB > 105.9KB > 159.5KB > 173.4KB > 185.1KB > 173.6KB > 161.4KB > 153.3KB > 157.3KB > 133.1KB > 138.6KB
I don't see anything obvious here.
related to https://github.com/libp2p/js-libp2p-amino-dht-bootstrapper/issues/18#issuecomment-2101025419
I will be adding various observations here while looking around at connection open and close code in js-libp2p
Destroying connections on
MultiaddrConnection.close
1. timeout events not handled
https://github.com/libp2p/js-libp2p/blob/4ad63bb79c0c2e5c670b32aa534906b923dcf150/packages/transport-tcp/src/socket-to-conn.ts#L158-L178
When
maConn.close
is called, we try to explicitly close a socket, handlingclose
anderror
events, but nottimeout
events which can definitely occur.From https://nodejs.org/docs/latest-v20.x/api/net.html#socketsettimeouttimeout-callback
This seems like it could easily lead to TCPSocketWrap continuing to increase.
Questions
setTimeout
again, does it still allow the callback defined at https://github.com/libp2p/js-libp2p/blob/4ad63bb79c0c2e5c670b32aa534906b923dcf150/packages/transport-tcp/src/socket-to-conn.ts#L62C1-L75C5 to trigger? (it seems like that's the intention).2. Manual
socket.destroySoon
logichttps://github.com/libp2p/js-libp2p/blob/4ad63bb79c0c2e5c670b32aa534906b923dcf150/packages/transport-tcp/src/socket-to-conn.ts#L183-L194
I'm not sure if we're intentionally not using
socket.destroySoon
but AFAIK, the only thing we would be missing by switching this code tosocket.destroySoon
is thelog('%s socket drained', lOptsStr)
log line, which we could easily add up above.3. Socket
'drop'
eventhttps://github.com/libp2p/js-libp2p/blob/4ad63bb79c0c2e5c670b32aa534906b923dcf150/packages/transport-tcp/test/max-connections.spec.ts#L64-L66
We have some tests that test our maxConnections, but we have no code that handles the
'drop'
event in https://github.com/libp2p/js-libp2p/blob/4ad63bb79c0c2e5c670b32aa534906b923dcf150/packages/transport-tcp/src/socket-to-conn.ts.Questions
drop
ped Sockets are not closed out properly?4.
Socket.unref
I don't see that we're calling
socket.unref
anywhere, and adding it kind of feels like a hack, but I think there are a few locationsmaConn.close
and others where we could callsocket.unref()
to make sure we're explicit about when the socket is no longer needed.e.g. once we call
maConn.close
and'drain'
is emitted,socket.unref()
could be safely called?Holistic Socket management
1. Transport class vs toMaConn socket event cleanup
https://github.com/libp2p/js-libp2p/blob/4ad63bb79c0c2e5c670b32aa534906b923dcf150/packages/transport-tcp/src/index.ts#L242-L245
TCP Transport class cleans up events it adds to sockets, why doesn't
socket-to-conn.ts
orlistener.ts
?Questions
allEventListeners
in the TCP transport class?