netty / netty

Netty project - an event-driven asynchronous network application framework
http://netty.io
Apache License 2.0
33.29k stars 15.89k forks source link

DnsNameResolver leaks memory through FastThreadLocal #9884

Open ejona86 opened 4 years ago

ejona86 commented 4 years ago

DnsNameResolver uses a non-static FastThreadLocal:

https://github.com/netty/netty/blob/ee7027288ec5f3d688be3cee45a521a5786ccf71/resolver-dns/src/main/java/io/netty/resolver/dns/DnsNameResolver.java#L233-L239

If the user creates many DnsNameResolvers over the lifetime of a process, the InternalThreadLocalMap.nextVariableIndex() will grow large which will cause UnpaddedInternalFastThreadLocal.indexedVariables to grow large. In addition, FastThreadLocal uses integers for its map, so it will not be aware if the FastThreadLocal used by DnsNameResolver is garbage collected. That means that any values stored in the thread local will not be released until the thread dies.

This has not impacted me, but we were impacted by PooledByteBufAllocator which has this problem as well. The allocator issue is even more complicated, however, because the values are arenas which have a reference to the allocator which will retain the PoolThreadLocalCache (and so GC is not possible). I did an audit of Netty and DnsNameResolver was the only other class that looked likely to trigger this behavior. I expect to file a separate issue for the allocator.

The solution may just be to inform users via the DnsNameResolver Javadoc. It would also be possible to have UnpaddedInternalFastThreadLocal detect when FastThreadLocal instances are GC'd and have it clear its value, and separately find a way to put the variableIndex into a freelist for reuse.

Ideally though DnsNameResolver would use a static FastThreadLocal (maybe along with a WeakHashMap?) or avoid using a thread local.

Netty version

4.1

normanmaurer commented 4 years ago

@ejona86 I would be happy to review a PR .

ejona86 commented 4 years ago

@normanmaurer, you think we should just start with a documentation change, since most other things are considerably more invasive? Do you think it is practical to say that DnsNameResolvers instances should only be created and not destroyed? I would imagine that is fine, but wanted to confirm.

ninja- commented 4 years ago

This has not impacted me, but we were impacted by PooledByteBufAllocator which has this problem as well. The allocator issue is even more complicated, however, because the values are arenas which have a reference to the allocator which will retain the PoolThreadLocalCache (and so GC is not possible). I did an audit of Netty and DnsNameResolver was the only other class that looked likely to trigger this behavior. I expect to file a separate issue for the allocator.

regarding the allocator and related classes, this was designed to be tunable using system properties

ejona86 commented 4 years ago

@ninja-, system properties and globals don't help a library much; we were needing different defaults in grpc and have to create our own allocator to do so. Creating the instance isn't that big of a deal, but it is a big surprise that the instance will leak itself.

ninja- commented 4 years ago

@ejona86 was the problem caused by heap buffers by any chance?

ninja- commented 4 years ago

anyway even if these allocator caches look annoying in the heapdump, I very much doubt they could take your app to OutOfMemory

ninja- commented 4 years ago

@ejona86 ah sorry I think I understand now, that your problem is not the cache being there but cache leaking when you create multiple allocators?

anyway even using global allocator as most people do, some complain about the pool cache taking any heap at all, and that's what these system properties were for to tune.

for example I am disabling heap buffer cache entirely as I found it to be buggy and eat too much

        System.setProperty("io.netty.allocator.numHeapArenas", "0");
ejona86 commented 4 years ago

@ninja-, there were direct/unsafe buffers in addition to heap, but that doesn't matter. We were creating our own allocator to reduce memory usage for light users of grpc (like a server running just for stats collection), since the increased memory usage was causing apps to exceed their memory (which isn't helped by direct memory not being accounted for in -Xmx). These servers we don't control; we were using grpc in a library in an application that previously didn't use Netty.

Everything seemed to work fine, but two apps quickly noticed memory usage steadily increasing until the app was killed due to OOM. Looking at the heap in an experiment we saw 1000s of allocators which consumed 9 GB of memory. In grpc we have a SharedResourcePool which is sort of like a static but will release resources that are unused for 1 second (so resources are reference counted). We use it for thread pools and the like. If the resource is needed again in the future it will just be re-created. That doesn't tend to happen too often because we encourage applications to reuse certain grpc objects as almost-singletons.

However, with light usage of grpc (different light users than the ones we were optimizing for!), say doing one RPC and then throwing away the grpc channels (which is fine, but releases the resource), each RPC will create a new allocator. Memory will leak until you crash.

Changing the system property isn't helpful to us, since we are a library and don't control the command line of the app. (And changing the system property from our library is a Bad Idea™.)

Leaking is super-bad for an allocator. For DnsNameResolver you don't leak near as much, but if someone has a "I don't care about performance" app that creates a new DnsNameResolver every DNS request, then they would suffer rapidly from the memory leak.

normanmaurer commented 4 years ago

@ejona86 yeah a documentation change is a good start.

normanmaurer commented 4 years ago

@ejona86 ^^any update ?

slandelle commented 4 years ago

If the user creates many DnsNameResolvers over the lifetime of a process

We're most likely the only one to do that (load testing) :P

vietj commented 7 months ago

Any reason why simply using a static fast thread local there is preventing this ?

vietj commented 7 months ago

Actually that's not really possible at the moment (replying to old myself).

However it is possible to diminish the creation of FastThreadLocal by allocating a single FastThreadLocal instance for a given DnsServerAddressStreamProvider in the builder and reusing this instance when the DnsNameResolver are allocated in the build() method.

Is that something that should be done ?

normanmaurer commented 7 months ago

I am happy to review a PR

vietj commented 7 months ago

I am happy to review a PR

let me craft that then!