Closed erik-wramner closed 4 months ago
I think we have found the problem. It appears that the code is creating new MongoClients everywhere. There should only be one, as it is basically a connection pool. On the other hand, at least findPrefixEntry fails to close the connection (connClient) obtained from the pool - instead, it closes the pool, but that will not close connections that are still in use. Thus we have a leak.
Ideally we should rewrite to use a single global MongoClient and to close all connections (returning them to the pool) in finally blocks.
Thanks for highlighting this issue, I have updated the code to fix:
I've tested and updated the code. I used ab (Apache Bench) and it does seem to be pretty fast now!
I made the change to: resolver_web_server/src/db.js
The changes are:
Apologies for this I am now focussed on Resolver v3.0 and I have stopped maintenance on v2.6. I hope you find the far simpler v3.0 to your liking!
We are running the resolver behind a load balancer that uses GET requests to check that it is up. When the system starts, this works fine:
Mon, 11 Dec 2023 09:01:15 GMT >> identificationKeyController: Found the dbDocument for requested URL /07322975634978/ Mon, 11 Dec 2023 09:01:15 GMT >> identificationKeyController: Found the dbDocument for requested URL /07322975634978/ Mon, 11 Dec 2023 09:01:15 GMT >> identificationKeyController: Found the dbDocument for requested URL /07322975634978/
However, after some time we get MongoDB errors:
Mon, 11 Dec 2023 08:50:22 GMT >> identificationKeyController: Found the dbDocument for requested URL /07322975634978/ Mon, 11 Dec 2023 08:50:22 GMT >> Exception occur in findDigitalLinkEntryUsingIdentifier method Mon, 11 Dec 2023 08:50:22 GMT >> MongoServerSelectionError: connection timed out Mon, 11 Dec 2023 08:50:22 GMT >> identificationKeyController: Found the dbDocument for requested URL /07322975634978/ Mon, 11 Dec 2023 08:50:24 GMT >> Exception occur in findDigitalLinkEntryUsingIdentifier method Mon, 11 Dec 2023 08:50:24 GMT >> MongoServerSelectionError: connection timed out
Looking at the active connections to MongoDB, each side of the cluster has about 2k connections even though traffic is very low (consisting of the load balancer ping):
The graphs went down when I killed the resolver. It will work itself up to the original figures again.
My take on this is that there is a connection leak, so that MongoDB connections are not released and eventually we run into the limit on allowed active connections. This really must be solved or the resolver dies within hours every time.
EDIT: here is the graph some time later, notice how connections keep increasing.
EDIT: and the connections keep going up.
EDIT: https://stackoverflow.com/questions/71632524/mongodb-connections-dont-close-after-query seems to be the same error. I can note that we never close connections on errors (the close should ideally be in a finally block to run even if there are exceptions), but we don't get exceptions... so it must be something wrong in the regular happy flow.