status-im / nim-websock

Websocket for Nim
81 stars 15 forks source link

nwaku memory issue in chronos/streams/tlsstream.nim #138

Closed Ivansete-status closed 1 year ago

Ivansete-status commented 1 year ago

Summary

We're tracking a likely memory leak in secure websocket connections within nwaku. We have done heaptrack analysis and concluded that most memory allocations are within the tlsLoop:

233399758-2c219b77-bb06-45fc-9a54-29a40954ad05

We also confirmed that memory usage improves from +1GB to <300MB if running without wss connections, and doesn't show evidence of leak.

The issue appeared only in wakuv2.test fleet. We're working to replicate it locally.

The issue doesn't appear when disabling web sockets even having the normal # of peers, as shown next: 233682140-f58ec20c-f280-4579-9cd4-e9b20c8032e6

Problem

Particularly, heaptrack indicates that leaks come from:

  1. Chronos - one - https://github.com/status-im/nim-chronos/blob/e05d2f8e9648e17b9a71bf5e909a79241c067c8b/chronos/streams/tlsstream.nim#L366

  2. Chronos - tlsWriteApp - https://github.com/status-im/nim-chronos/blob/e05d2f8e9648e17b9a71bf5e909a79241c067c8b/chronos/streams/tlsstream.nim#L339

Further information

jm-clius commented 1 year ago

Thanks for opening an issue, @Ivansete-status. Although links are useful to include in issue description, it's worth summarising the main conclusions as well and make the issue title more to the point. We would preferably want someone looking at this issue (only) to understand exactly what the problem is (summarised briefly) with option to visit links for more info. Something like:

We're tracking a likely memory leak in secure websocket connections within nwaku. We have done heaptrack analysis and concluded that most memory allocations are within the tlsLoop (link to stacktrace) and confirmed that memory usage improves with factor x if running without wss connections (and doesn't show evidence of leak).

I'd suggest adding a # Problem section where you explicitly explain that we're noticing a memory leak in wss and why we believe this is the case, including a copy of this graph with brief explanation: https://github.com/waku-org/nwaku/issues/1662#issuecomment-1518038009. You could also mention that we're working on simple repro instructions.

Menduist commented 1 year ago

Turns out this wasn't a bug here, but the TLS stack uses cyclic memory references, so they are cleaned less frequently, which makes heaptrack think it is a memory monger. Investigations are under progress and will continue here https://github.com/waku-org/nwaku/issues/1662