netty / netty

Netty project - an event-driven asynchronous network application framework
http://netty.io
Apache License 2.0
33.34k stars 15.9k forks source link

io.netty.util.internal.InternalThreadLocalMap,memory leak #6891

Open luyuekai opened 7 years ago

luyuekai commented 7 years ago

Expected behavior

No memory leaks occur

Actual behavior

undeploy my web application.

Steps to reproduce

I use elasticsearch jar and it contains netty. when I close the ESClient in the contextDestroyed, the Glassfish logs show severe error, as follows.

Minimal yet complete reproducer code (or URL to code)

The web application [/service_generic_log] created a ThreadLocal with key of type [java.lang.ThreadLocal] (value [java.lang.ThreadLocal@5eeeaf73]) and a value of type [io.netty.util.internal.InternalThreadLocalMap] (value [io.netty.util.internal.InternalThreadLocalMap@6304ac92]) but failed to remove it when the web application was stopped. Threads are going to be renewed over time to try and avoid a probable memory leak.

Netty version

netty-3.10.6 Final.jar
netty-buffer,codec,codec-http,common,handler,transport,handler-4.1.9.Final.jar

JVM version (e.g. java -version)

openjdk version "1.8.0_111" OpenJDK Runtime Environment (build 1.8.0_111-8u111-b14-2ubuntu0.16.04.2-b14) OpenJDK 64-Bit Server VM (build 25.111-b14, mixed mode)

OS version (e.g. uname -a)

Linux lyk 4.6.2-040602-generic #201606100516 SMP Fri Jun 10 09:18:34 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux

Scottmitch commented 7 years ago

Netty 3.x is EOL and no long supported. Please upgrade to Netty 4.1 and report back if the issue still persists.

jasontedor commented 7 years ago

That's not Netty 3, you can tell from the package name in the report.

Scottmitch commented 7 years ago

thanks @jasontedor I see the io.netty.util now ... knee jerk reaction for seeing netty-3.10.6 in the version description.

Scottmitch commented 7 years ago

@luyuekai please provide a heap dump and/or a minimal reproducer.

luyuekai commented 7 years ago

The Netty is used in the thirty library(elasticsearch), when I call the function in elasticsearch to close its client, the exception will be thrown. And I have opened an issue about it. https://github.com/elastic/elasticsearch/issues/25327 @Scottmitch Thanks a lot!

luyuekai commented 7 years ago

Of course , I can ignore it for the moment. There is only little influence in my project. I want to know what makes this exception. Thanks again!

tpthaler commented 6 years ago

I am getting this same issue via spring boot (2.0) include (it's including version 4.1.22.Final)...

19-Mar-2018 00:49:03.526 SEVERE [RMI TCP Connection(8)-127.0.0.1] org.apache.catalina.loader.WebappClassLoaderBase.checkThreadLocalMapForLeaks The web application [hello] created a ThreadLocal with key of type [java.lang.ThreadLocal] (value [java.lang.ThreadLocal@66c3c1a5]) and a value of type [io.netty.util.internal.InternalThreadLocalMap] (value [io.netty.util.internal.InternalThreadLocalMap@515b1e46]) but failed to remove it when the web application was stopped. Threads are going to be renewed over time to try and avoid a probable memory leak.

johnou commented 6 years ago

@tpthaler try something like this..

public class MyServletContextListener implements ServletContextListener {

    public void contextInitialized(ServletContextEvent event) {

    }

    public void contextDestroyed(ServletContextEvent event) {
        InternalThreadLocalMap.destroy();
    }

}
waleed-dawood commented 6 years ago

I am also having the same issue

06-Jun-2018 06:29:05.042 SEVERE [localhost-startStop-1] org.apache.catalina.loader.WebappClassLoaderBase.checkThreadLocalMapForLeaks The web application [ROOT] created a ThreadLocal with key of type [java.lang.ThreadLocal] (value [java.lang.ThreadLocal@3d3627c1]) and a value of type [io.netty.util.internal.InternalThreadLocalMap] (value [io.netty.util.internal.InternalThreadLocalMap@24f3fbb2]) but failed to remove it when the web application was stopped. Threads are going to be renewed over time to try and avoid a probable memory leak.

vidhyasriramulu commented 6 years ago

Iam also having the same issue

SEVERE [localhost-startStop-2] org.apache.catalina.loader.WebappClassLoaderBase.checkThreadLocalMapForLeaks The web application [apprity] created a ThreadLocal with key of type [java.lang.ThreadLocal] (value [java.lang.ThreadLocal@1994ff12]) and a value of type [io.netty.util.internal.InternalThreadLocalMap] (value [io.netty.util.internal.InternalThreadLocalMap@7d6a4e1e]) but failed to remove it when the web application was stopped. Threads are going to be renewed over time to try and avoid a probable memory leak.

is there any solution for this ?

ghost commented 5 years ago

Any update on this. We have similar problem integrated this with spring 5.

johnou commented 5 years ago

@misaa2 please provide a sample project with instructions on how to reproduce and we can take a look.

luyuekai commented 5 years ago

I have no solutions yet,sorry~

whyvrafvr commented 5 years ago

Hi @johnou. We have the same problem running Firestore on a Java EE container (Payara) or using Spring. I can set an exemple (using docker and payara) if you need.

johnou commented 5 years ago

@skonx :+1: please do.

whyvrafvr commented 5 years ago

Hi @johnou. I've opened a github rep with the minimal requirement. I cannot provide a docker configuration yet. Hope your familiar with Glassfish / Payara otherwise I will create a dockerfile.

whyvrafvr commented 5 years ago

Hi @johnou. How can I help?

johnou commented 5 years ago

@skonx try adding in this..

@WebListener
public class MyListener implements ServletContextListener {
    @Override
    public void contextInitialized(ServletContextEvent sce) {
    }
    @Override
    public void contextDestroyed(ServletContextEvent sce) {
        InternalThreadLocalMap.destroy();
    }
}
whyvrafvr commented 5 years ago

Thanks. I will test this tip on Tuesday morning on the code I’ve provided.

normanmaurer commented 5 years ago

@skonx let us know once done

whyvrafvr commented 5 years ago

Hi guys.

I've implemented the ServletContextListener.

Using io.netty.util.internal.InternalThreadLocalMap or io.grpc.netty.shaded.io.netty.util.internal.InternalThreadLocalMap and deploying / shuting down the webapp I get the same error 👍

Severe: The web application [] created a ThreadLocal with key of type [java.lang.ThreadLocal] (value [java.lang.ThreadLocal@71c072ff]) and a value of type [io.grpc.netty.shaded.io.netty.util.internal.InternalThreadLocalMap] (value [io.grpc.netty.shaded.io.netty.util.internal.InternalThreadLocalMap@787af90c]) but failed to remove it when the web application was stopped. Threads are going to be renewed over time to try and avoid a probable memory leak.

normanmaurer commented 5 years ago

@skonx could you come up with a small self-containing reproducer that I can use ?

whyvrafvr commented 5 years ago

https://github.com/trendev/firestore-issue

normanmaurer commented 5 years ago

@skonx I am looking for a reproducer that just uses netty

whyvrafvr commented 5 years ago

well I'm using netty through firebase sdk...

ejona86 commented 5 years ago

The problem with calling InternalThreadLocalMap.destroy (or the public API FastThreadLocal.destroy()) is that it only clears the current thread. Some of this was discussed way back in 2014 https://github.com/netty/netty/pull/2578#issuecomment-46335248. Setting the ThreadLocal to null would help the problem, which destroy() did before https://github.com/netty/netty/pull/5012/files#r56949296 (I figured out the answer to the question!).

ThreadLocalMap (the thing stored in Thread) has a weak reference to ThreadLocal but a strong reference to the value. But since the value is a Netty type it has a strong reference to the ClassLoader which has a reference to the static ThreadLocal. So the ThreadLocal will never be GC'd for the life of the Thread.

ThreadLocal lazily cleans up GC'd ThreadLocals from ThreadLocalMap when adding new entries. So clearing the reference to the ThreadLocal will allow it to be GC'd and then you can hope enough other users of ThreadLocal come along and trigger a clean up of the specific Entry for our ThreadLocal.

@normanmaurer, this reminds me of ThreadCleaner/ThreadWatcher/whatever-it-was :).

tyagihas commented 5 years ago

I'm also using Firebase SDK and facing an issue in different situation. I think the root cause is the same.

So the ThreadLocal will never be GC'd for the life of the Thread.

I run a long running thread which sleeps a certain interval and retrieves + updates entries in Firestore. I observe memory usage goes up after every iteration. Does this mean it is not recommended using Firestore (+ Netty) with long running threads?

Fsstyle commented 5 years ago

@skonx try adding in this..

@WebListener
public class MyListener implements ServletContextListener {
  @Override
  public void contextInitialized(ServletContextEvent sce) {
  }
  @Override
  public void contextDestroyed(ServletContextEvent sce) {
      InternalThreadLocalMap.destroy();
  }
}

useless

rpkim commented 5 years ago

Hi, I faced a similar issue when I try to set up the Micrometer with standalone tomcat(.war file)

the error message is,

SEVERE: The web application [myapp] created a ThreadLocal with key of type [java.lang.ThreadLocal] 
(value [java.lang.ThreadLocal@bc4551b]) and a value of type 
[io.micrometer.shaded.io.netty.util.internal.InternalThreadLocalMap] (value 
[io.micrometer.shaded.io.netty.util.internal.InternalThreadLocalMap@62deb293]) 
but failed to remove it when the web application was stopped. 
Threads are going to be renewed over time to try and avoid a probable memory leak.

I have a heapdump file. If you let me know the email address, I will share the heapdump file.

johnou commented 5 years ago

@rpkim please report it to the micrometer issue tracker.

trajano commented 4 years ago

This appears when using GRPC which uses Netty as well. As noted by @ejona86 https://github.com/grpc/grpc-java/issues/6692

davidgfolch commented 3 years ago

Hi,

I'm having same issue with a project:

The SpringRemoteCacheManager.stop() is calling nativeCacheManager.getChannelFactory.destory() but there are still some theads not closing on context undeployment:

06-Oct-2020 05:38:49.075 SEVERE [ContainerBackgroundProcessor[StandardEngine[Catalina]]] org.apache.catalina.loader.WebappClassLoaderBase.checkThreadLocalMapForLeaks The web application [ebo##000000013] created a ThreadLocal with key of type [java.lang.ThreadLocal] (value [java.lang.ThreadLocal@508e050f]) and a value of type [io.netty.util.internal.InternalThreadLocalMap] (value [io.netty.util.internal.InternalThreadLocalMap@753fd803]) but failed to remove it when the web application was stopped. Threads are going to be renewed over time to try and avoid a probable memory leak.

org.infinispan.client.hotrod.impl.transport.netty.ChannelFactory

   public void destroy() {
      try {
         channelPoolMap.values().forEach(ChannelPool::close);
         eventLoopGroup.shutdownGracefully(0, 0, TimeUnit.MILLISECONDS).get();
         executorService.shutdownNow();
      } catch (Exception e) {
         log.warn("Exception while shutting down the connection pool.", e);
      }
   }
scruz-denodo commented 3 years ago

Hi, I uploaded a repository with steps for reproducing the issue: https://github.com/scruz-denodo/netty-thread-local-issue

It contains two ways for checking the problem.

Some comments:

Thanks!

pshevche commented 2 weeks ago

I'm putting my two-pence here. We also saw this causing issues in the Gradle Build Tool. Like web frameworks, Gradle has a pool of long-living threads that it provides to plugins to run arbitrary work on. We discovered a case of a plugin using Netty, which polluted these threads with InternalThreadLocalMap.

We realize that it is hard for Netty to do anything about it, as it is unaware of the thread lifecycle or whether the framework that leased the thread has mechanisms to clean up those ThreadLocals. The user of the library needs to know how to handle this case.

However, it seems like there are some changes that Netty could make to help library users avoid this hard-to-debug issue:

  1. Warn if the ThreadLocal is set on a non-Netty thread.
  2. Document best practices for interacting with the methods that may set this ThreadLocal. For example, document that the initial Bootstrap#connect should happen from a short-lived thread and all work in the handler pipeline should happen solely on Netty threads. For example, we figured that ctx.channel().alloc().buffer() would also set the ThreadLocal and should not be called on long-living threads.
  3. Is there a way to make the slowThreadLocalMap non-static, as it would allow the cleaning of its value once references to the key are gone?
trajano commented 2 weeks ago

TBH I think it there needs to be a new project if one doesn't exist already to do Netty like performance using Virtual Threads.

https://stackoverflow.com/questions/78318131/do-java-21-virtual-threads-address-the-main-reason-to-switch-to-reactive-single

But that's likely going to be a total rewrite/re-architecture

franz1981 commented 3 days ago

@pshevche I think that starting from netty 4.2. there won't be any need of thread local for the allocators (there's isn't afaik) - see https://github.com/netty/netty/pull/14332 for the final blow (which enable pooling on virtual threads without using any thread local - too).

Currently you can enable -Djdk.traceVirtualThreadLocals=true which does it - if you find anything weird, pelase report it here and if we can fix it, I will happily do it ;)