vibe-d / eventcore

High performance proactor event loop abstraction library
MIT License
60 stars 42 forks source link

Possible s_driver leaking path #178

Closed omerfirmak closed 3 years ago

omerfirmak commented 3 years ago

This came up while investigating https://github.com/bosagora/agora/issues/1480

For DNS lookups vibe.d is spawning threads and this results in multiple calls to ctor and dtor below.

https://github.com/vibe-d/eventcore/blob/d070fef3afe21dbeb75251cd5c95bfa54798b3d9/source/eventcore/core.d#L34-L53

The problem is the assumption at L38, which is not always true. When module ctors are being run, they can trigger a call to eventDriver which then creates a driver instance. In our case this was a Timer dtor triggered by a GC cycle which was triggered by an allocation in one of our module ctors.

Once that assertion fails, the newly created thread is terminated and the module dtor for core.d is never called. That causes s_driver and all the heap allocated stuff with it to be leaked every once in a while. But this is common enough to crash our application in a couple of days time span.

Where do you suggest this should be resolved? Do you think the module dtors should be called even if a module ctor throws an exception?

s-ludwig commented 3 years ago

In our case this was a Timer dtor triggered by a GC cycle which was triggered by an allocation in one of our module ctors.

This sounds like an issue that needs to be fixed. Since unfortunately the GC ignores thread-safety when calling finalizers, it is a programming error to let the GC reap vibe.d objects. They need to be disposed explicitly from their owner thread before the last reference disappears.

There is some mitigation logic in vibe/core/internal/release.d, but it seems like that didn't work here for some reason.

s-ludwig commented 3 years ago

Looking at the code, scratch that, releaseHandle itself calls eventDriver, so that was probably the issue. It would be nice to change that to use some pure getter version of eventDriver in order to avoid creating unused event driver instances in utility threads, though.

Geod24 commented 3 years ago

@s-ludwig : For reference, here's what our server memory usage looked like for the past ~3 months:

Screen Shot 2021-03-31 at 9 32 54

We have two instances of our app running per server, and the OOM killer would play Russian roulette with them on a daily basis. The less-steep curve at the beginning was just because there was little to no load on the app. We deployed the fix in #179 yesterday.

s-ludwig commented 3 years ago

For reference, here's what our server memory usage looked like for the past ~3 months:

Definitely a great catch. I wonder whether the dub registry is affected by this, too.

I've also added tryGetEventDriver in #182 so that vibe-core can avoid creating unnecessary driver instances.