com.google.common.escape.Platform - ThreadLocal leak

sdavids commented 7 years ago

org.apache.catalina.loader.WebappClassLoaderBase.checkThreadLocalMapForLeaks The web application [xxx] created a ThreadLocal with key of type [com.google.common.escape.Platform$1] (value [com.google.common.escape.Platform$1@614d8014]) and a value of type [char[]] (value [[C@1fb37a49]) but failed to remove it when the web application was stopped. Threads are going to be renewed over time to try and avoid a probable memory leak.

Maaartinus commented 7 years ago

As I wrote long time ago, this one is trivially fixable. Using ThreadLocal#initialValue together with a JDK class saves a line or two at the cost of potential memory leak. No good deal, IMHO.

cpovirk commented 6 years ago

Thank you for reporting this. I'm sorry for letting it sit. Every time I go to look up the problem with ThreadLocal leaks, I find a bunch of incorrect information :( (In particular, I think that some of the Tomcat documentation is wrong.) Additionally, there seem to be several related problems, so it's wrong for me to look for "the [one] problem." My current understanding, which hopefully I have right and will be able to communicate well, is:

Generally speaking, if all the threads that have values go away, then nothing important is leaked. Perhaps there are cases in which that's not true, but I haven't looked into them, and that's not what this issue is about, so let's set that aside.

The main family of problems, then, comes when a Thread sticks around in a thread pool but the ThreadLocal that has a value for that Thread will never be used again. (I guess in theory there could be a problem in which the ThreadLocal was going to be used again but we knew that it would never be used from that particular Thread. But I don't think that's the scenario anyone has in mind, either.)

In a sense, this can happen with any static ThreadLocal that is similar to the one in the escapers: A user can call an escaper method once, triggering the 1k allocation, and then never call another escaper method. But we still keep the 1k around because, as far as we know, the user might call one, and we want to optimize for that. (In contrast, consider a case in which a method uses a ThreadLocal only as a backdoor for communication across the current call stack -- like tracking whether the method is called reentrantly. In such cases, the library knows when it can clear the value.) Often, this won't matter at all: If it's a ThreadLocal<Boolean> with no other references, then at worst, we leave a reference sitting in a bucket in a hash table. In the case of the escapers, we're holding a full 1k of space. That's not ideal, but it's still unlikely to make a difference for most users on its own. (And for users for whom it does matter, they might even prefer that we not have the 1k buffer at all, even if it went away when they were done using the class, because we would still hold on to that 1k between their calls.)

The problems that people typically think of with ThreadLocal have more going on than that. For starters:

App containers can load Guava multiple times in multiple classloaders. That's a little sad, but presumably the number of apps is bounded, so we're talking about 1k × apps of memory. Except... app containers may reload the same app multiple times, reloading Guava each time, too. What happens then is a little interesting. Each Thread instance has a map from ThreadLocal instance to value. The ThreadLocal is wrapped in a WeakReference, but the value is not. This generally makes sense (though it has drawbacks, and there might be better ways). The strange part is that, when the WeakReference is cleared, Thread does not clear its reference to the corresponding value as promptly as you might expect: WeakHashMap, for example, cleans orphaned values on every read (IIUC). Thread cleans up more opportunistically -- whenever an operation already happens to go looking in that bucket. So in theory, values could hang around a long time. I'm not sure that this becomes much of an issue in practice, though: Presumably, if your app loads Guava and uses some feature that uses ThreadLocal, it's going to use that feature every time it's loaded, so it's going to keep inserting ThreadLocal entries, and eventually Thread will happen to stumble across the buckets with the orphaned values, and it will clear them. Or, if you never insert another ThreadLocal, then your ThreadLocal usage is not going to grow, either :) So I'd expect the result here to usually be only that a bounded amount of memory is held, much less than the total amount you had when Guava was loaded -- not the ideal outcome, but not out-of-memory territory.

I think that, in principle, Tomcat wants to warn about all such cases. (Fun fact: They apparently used to clear your ThreadLocal for you, rather than renewing the threads, which sounds like it may have had concurrency problems.) The warning is maybe a little overactive, but it's hard to blame Tomcat (well, unless you blame it for reusing threads across apps, which may be a reasonable thing to blame it for, but let's ignore that :)): It can't know whether the retained objects are permanent unless it tries to trigger Thread's internal cleanup -- which I gather that newer Tomcats do, actually. Maybe it would be nice for Guava to work around the warning to avoid worrying people, but I'm not sure how to do that. At best, we could provide a method for users to call when they are unloading their app (though they'll have to be careful not to call it until after they're done using the escapers). (I'd hoped that newer Tomcats would prevent the warning from occurring. But at the time they run Thread's internal cleanup, the app classloader (and thus Platform.DEST_TL) is still strongly referenced, so I don't think the cleanup can remove the offending ThreadLocal values. Maybe they could find a way around this; I'm not sure.)

So what are the ThreadLocal problems that everyone worries about? Again, I don't think these affect the escapers, but:

The fundamental problem comes when the WeakReference never gets cleared. That happens, of course, when something retains a strong reference to the ThreadLocal itself. And that something is the ThreadLocal's value. Once that happens, we have a leak: We know that a Thread is still around, and we know that it has strong references to its ThreadLocal values. So if a ThreadLocal value refers back to the ThreadLocal itself, then we have Thread -> threadLocalValue -> ThreadLocal, and so the ThreadLocal and its value stick around as long as the Thread.

One indirect way this can happen is when your ThreadLocal's value is an instance of a custom type. That value refers to your ClassLoader, which likely refers back to the ThreadLocal itself: Thread -> threadLocalValue -> ClassLoader -> SomeClassHoldingTheThreadLocal -> ThreadLocal. I think that this is what was discussed in https://github.com/google/guava/issues/1553 (and fixed -- though I think the fix did more than necessary, as I'll get to next).

But what if your ThreadLocal itself is of a custom type -- most likely because you wanted to override initialValue? I think this is what @Maaartinus was getting at. Perhaps he's influenced by the fact that Tomcat warns about this, just as it warns about values of a custom type. But I don't think Tomcat is claiming[*] "a ThreadLocal of a custom type is worse than a vanilla ThreadLocal"; I think it's saying "If the ThreadLocal is of a custom type from web app X, then clearly it was inserted by web app X," so Tomcat should warn when web app X is unloaded. I don't think the custom ThreadLocal type matters. Sure, an object of a custom ThreadLocal type will reference the app classloader. But the ThreadLocal is held in a WeakReference. So, if the classloader is otherwise unreferenced, then it (and the ThreadLocal) should be garbage collected, just as they would be with a vanilla ThreadLocal. (And if the classloader is still otherwise referenced, then you already have a leak.)

(@Maaartinus mentions initialValue specifically in the context of using "a JDK class," I assume referring to byte[]. I assume his point is that, if you use a custom class, then that alone can keep your classloader alive, as noted above. So, if you're using a custom class, who cares if you add to the problem by implementing initialValue? In contrast, with a JDK class, you can avoid leaking a classloader if you avoid using initalValue. I think that this would make sense if initialValue were a problem. I just don't currently understand why it would be.)

To tie all this back to the warning: I think that we could make the warning go away by avoiding initialValue. But I don't think we'd actually be making anything better: There existing (small!) leak would still be there, just undetectable to Tomcat. Still, maybe it's worth doing to silence the warning, since it's so easy.

[*] OK, actually, as I noted earlier, I think that its docs do claim this. But I think they're wrong.

google / guava

com.google.common.escape.Platform - ThreadLocal leak #2790