FinalizableReferenceQueue keeps ClassLoaders around.

GoogleCodeExporter commented 9 years ago

If the class loader which loaded the google-collections jar is let go
(consider redeploying a war to Apache Geronimo) the class loader will not
be able to be garbage collected.  This is caused by the thread started by
FinalizableReferenceQueue never ending.  In other words the garbage
collection (for the class loader) can only happen when the thread stops. 
The solution is to stop the thread when there are no more objects on the
queue.  This will happen eventually if the rest of the app behaves
correctly by not placing state outside of its class loader's environment. 
Attached is a version of the class which I have used to this effect.

This is also a problem in the OSGi space where every jar gets its own class
loader and they can be refreshed easily at runtime to do rolling
upgrades/bug fixes.

Original issue reported on code.google.com by nairb...@gmail.com on 28 Jul 2008 at 5:39

Attachments:

FinalizableReferenceQueue.java

GoogleCodeExporter commented 9 years ago

I know it's been a long time coming, but we currently plan to fix this for 
release 10.

One question, for those involved here: would the cleanup thread as implemented 
in 
http://gee.cs.oswego.edu/cgi-bin/viewcvs.cgi/jsr166/src/extra166y/CustomConcurre
ntHashMap.java?revision=1.17&view=markup pose the same problem described here? 
I don't have a good sense of the boundaries of this problem.

Original comment by fry@google.com on 27 May 2011 at 11:50

GoogleCodeExporter commented 9 years ago

The jsr116y CCHM ReclamationThread itself is not held in a static (good) and 
instead just seems to float around until the static refqueue is no longer 
reachable (which will never happen since the JDK has no lifecycles). It also 
seems to be unstoppable.
This *might* work OK if it is part of the JDK, giving all CCHMs the same shared 
refqueue to plug into. However, as far as I can tell this would suffer from the 
same problems as before when e.g. part of a webapp or bundle because the 
lifecycle of the thread cannot be controlled, references the classloader and 
hence keeps it and everything it ever touched around.
This problem is *inherently impossible to implement correcty* unless either 
there is no extra thread (which is fine for drain on accumulated read/write but 
making idle-expiry impossible) or the thread/Executor is injected, with a 
controllable lifecycle from the outside. The latter solution would allow a 
client to get idle-expiry either via a client-supplied timer calling 
expireNow() or periodically doing "dummy reads".

Original comment by holger.h...@googlemail.com on 27 May 2011 at 12:32

GoogleCodeExporter commented 9 years ago

I'm in favor of the following:

No cleanup thread by default, but the users may specify an Executor at Map 
construction time if he wants idle-expiry. The key point here is that an 
Executor should be passed into the constructor, not injected statically across 
the entire JVM (otherwise you get problems in web containers).

Original comment by cowwoc...@gmail.com on 27 May 2011 at 1:46

GoogleCodeExporter commented 9 years ago

The fix for this has been submitted. Please let me know if you experience any 
problems with it.

Original comment by fry@google.com on 8 Jun 2011 at 6:16

Changed state: Fixed
Added labels: Milestone-Release10

GoogleCodeExporter commented 9 years ago

Fry,

Can you please explain what you ended up doing for the fix? Also, it's not 
clear what specific commit revision we should be looking at.

Original comment by cow...@bbs.darktech.org on 8 Jun 2011 at 6:47

GoogleCodeExporter commented 9 years ago

The change is http://code.google.com/p/guava-libraries/source/detail?r=445 and 
it involves doing reference cleanup on user threads rather than relying on a 
global background thread.

Original comment by fry@google.com on 8 Jun 2011 at 7:11

GoogleCodeExporter commented 9 years ago

If I understand the code correctly, you check for unused references when 
executing map methods (similar to WeakHashMap). An idle map would not clean up 
its references. Is that correct? If so, I'm okay with this fix.

Original comment by cow...@bbs.darktech.org on 8 Jun 2011 at 7:36

GoogleCodeExporter commented 9 years ago

Correct.

Original comment by fry@google.com on 8 Jun 2011 at 7:52

GoogleCodeExporter commented 9 years ago

I'm still seeing this issue with version 10.0-rc2. Undeploying a webapp in 
Tomcat still results in the following:

SEVERE: The web application [/app] appears to have started a thread named 
[com.google.common.base.internal.Finalizer] but has failed to stop it. This is 
very likely to create a memory leak.

The link in comment #56 is a 404, so I can't see what the changes were. Are 
others still noticing this problem with Tomcat webapps?

Original comment by d...@greywallsoftware.com on 22 Sep 2011 at 8:39

GoogleCodeExporter commented 9 years ago

Perhaps you are still using the deprecated FinalizableReferenceQueue or 
Finalizable*Reference?

We no longer use them internally, but if you do you'll pay a price for it...

Original comment by fry@google.com on 22 Sep 2011 at 9:35

GoogleCodeExporter commented 9 years ago

Still noticing with tomcat 7.0.21. I actually explicitly clean the maps when my 
Spring container stops. So by the time tomcat performs memory leaks validations 
the maps are empty.

Original comment by cheriomu...@gmail.com on 5 Oct 2011 at 10:06

GoogleCodeExporter commented 9 years ago

Are you using Guava release 10.0? Because we don't have code that uses 
FinalizableReferenceQueue any more.

Original comment by yrfselrahc@gmail.com on 6 Oct 2011 at 1:58

GoogleCodeExporter commented 9 years ago

Be aware that FRQ is no longer deprecated:

http://groups.google.com/group/guava-discuss/browse_thread/thread/fbf9ed43fe3e19
ac

Original comment by joe.j.kearney on 7 Oct 2011 at 8:12

GoogleCodeExporter commented 9 years ago

[deleted comment]

GoogleCodeExporter commented 9 years ago

Just to clarify: while FRQ is being un-deprecated, it is no longer used in 
MapMaker or any other collection class.

Original comment by mccu...@gmail.com on 7 Oct 2011 at 10:18

GoogleCodeExporter commented 9 years ago

We've started using guava 10.0.1 in a new project, and are specifically using 
FRQ to help manage certain object reference counts in an application server 
environment where guava is included as part of application code.

After digging through heap logs in search of classloader memory leaks that 
occur on application undeployment I've noted that FRQ still leaks because of 
reasons (1) and (2) given in comment #6 above (posted 3 years ago I might add).

Attached is a patch I wrote against the current source tree, partly based on 
the patch given in comment #6.  With this patch, those stray references from 
the Finalizer thread to the classloader containing the FRQ class no longer 
exist, and the classloader becomes eligible for garbage collection.

Incidentally we're also using guice-3.0 which contains a similar (but not 
identical) implementation of FRQ, and had to patch that in the same way at the 
same time.  Issue 288 in google-guice appears to be pretty much a duplicate of 
this thread.

Hopefully we can get this thing sorted out once and for all!

Original comment by stevenka...@gmail.com on 25 Nov 2011 at 3:16

Attachments:

guava-FRQ-patch.txt

GoogleCodeExporter commented 9 years ago

Over to you, Bob.

Original comment by kevinb@google.com on 29 Nov 2011 at 3:58

Changed state: New
Removed labels: Milestone-Release10

GoogleCodeExporter commented 9 years ago

Bob, can you please take a look at this patch? Even better, would you be 
willing to apply the patch to a git clone that so we could suck it in from 
there once you were happy with it?

Original comment by fry@google.com on 10 Dec 2011 at 3:34

Added labels: Package-Base

GoogleCodeExporter commented 9 years ago

Is there any activity on this? We have a project using guava that we'd really 
like to be able to redeploy into tomcat, and this is the last blocking issue 
for us.

I'm not so much interested in having it out ASAP, but rather just an estimate 
on when you guys think it'll make it into mainline guava - if it's more than a 
month or so away, I'm just going to patch guava myself and use that binary.

Original comment by dpr...@vast.com on 27 Dec 2011 at 7:58

GoogleCodeExporter commented 9 years ago

Bob, can you please provide an update on this?

Original comment by fry@google.com on 3 Jan 2012 at 1:40

GoogleCodeExporter commented 9 years ago

I don't think Bob is going to fix this.  Bob, reopen if you really are going to.

Original comment by kevinb@google.com on 16 Feb 2012 at 6:32

Changed state: WontFix

GoogleCodeExporter commented 9 years ago

Is closing the ticket the appropriate action, even if Bob will not fix it?  The 
issue is still valid and keeping it open will hopefully inspire somebody to 
work on it ;)

Original comment by blair-ol...@orcaware.com on 16 Feb 2012 at 7:01

GoogleCodeExporter commented 9 years ago

I agree. This issue should remain open unless it is fixed or there is a good 
reason for not fixing it.

Original comment by cow...@bbs.darktech.org on 16 Feb 2012 at 7:24

GoogleCodeExporter commented 9 years ago

I agree - this is a fairly large issue for anybody that uses Gauva in a managed 
environment.

Original comment by dpr...@vast.com on 16 Feb 2012 at 7:25

GoogleCodeExporter commented 9 years ago

After following this issue for the last six months we had enough and switched 
to sisu-guice where this issue has been fixed a long time ago.

Original comment by aki.kaiv...@gmail.com on 16 Feb 2012 at 7:35

GoogleCodeExporter commented 9 years ago

I've done this for several years myself, so I know that this is likely never 
true, but it seems that the fix for this issue is both quite small and already 
complete. Are there any blockers to just putting it in the next release, and if 
there are, what can the community to do help alleviate them?

Original comment by dpr...@vast.com on 16 Feb 2012 at 8:08

GoogleCodeExporter commented 9 years ago

Our dilemma here is that we don't have anyone to evaluate and test the patch.

But since there seems to be so much community support for this, let me turn it 
over to you. If the three of you who just spoke up can confirm that 
stevenkadams's patch of Nov 24, 2011 will fix this then we'd be glad to push 
that patch into release 12.

Original comment by fry@google.com on 16 Feb 2012 at 9:38

Changed state: Accepted

GoogleCodeExporter commented 9 years ago

I've tested this in our alpha and beta environments, and it works.

Original comment by dpr...@vast.com on 6 Mar 2012 at 4:09

GoogleCodeExporter commented 9 years ago

I'm rather uncomfortable with the proposed patch. In particular calling 
AccessController.doPrivileged should always give pause for thought and I don't 
think it is justified in this case. Additionally, the dependency on JDK 
implementation details of URLClassLoader seems gratuitous. A couple of 
alternatives seem preferable to me:

(1) Make FinalizableReferenceQueue implement Closeable and have its close() 
method dispose of the thread. This seems preferable to having a hidden and 
unstoppable thread that only goes away when exactly the right reference magic 
happens.

(2) Instead of bashing a private field of URLClassLoader, supply an explicit 
AccessControlContext argument to AccessController.doPrivileged that excludes 
the ProtectionDomains whose references are causing the problem. That would at 
least remove the JDK dependency.

Option (1) seems greatly preferable if it does address the problems people have 
been seeing.

Original comment by emcma...@google.com on 9 Mar 2012 at 11:11

GoogleCodeExporter commented 9 years ago

One way or another, we should really resolve this for 12.0, so if everyone can 
get behind (1) or (2), fantastic. It's also possible we may have to go with the 
patch if that's the solution everyone still wants to converge on.

Original comment by kevinb@google.com on 11 Mar 2012 at 11:11

Added labels: Milestone-Release12

GoogleCodeExporter commented 9 years ago

I'd rather go for (1), the patch seems more like a hack than a solution itself.

Original comment by terciofi...@gmail.com on 12 Mar 2012 at 12:05

GoogleCodeExporter commented 9 years ago

I'd support (1) at least. If you get to the point of worrying that the FRQ 
won't go away by itself then there should be a way to do it manually. If the 
present reference magic works in most (all?) other cases then that's great.

I suppose there will be issues with early shutdown of the FRQ racing with 
subsequently released objects that then won't be finalised. It seems to me that 
that's not something that can be fixed in this context - rather manage these 
objects' lifecycles explicitly. So I wouldn't consider that a reason not to 
provide close().

Original comment by joe.j.kearney on 12 Mar 2012 at 12:19

GoogleCodeExporter commented 9 years ago

The safest approach would be to add (1) and continue working on automatic 
cleanup. If you ever get it working then you could deprecate and remove (1) in 
the future. Instead we ended up waiting over six months for the automatic 
cleanup to materialize and it never really did.

Original comment by cow...@bbs.darktech.org on 12 Mar 2012 at 1:29

GoogleCodeExporter commented 9 years ago

I' propose (1) on the same basis as cow, it shouldn't hurt nobody. Automatic 
cleanup may be resolved at later point but I'd prefer even bullet-dodge 
solution for now.

Original comment by zeratul...@gmail.com on 12 Mar 2012 at 2:26

GoogleCodeExporter commented 9 years ago

OK, I will investigate this further. The tough part is writing a test that 
checks that the FinalizableReferenceQueue class can be unloaded after all of 
its instances have been closed. That means doing the test in a separate 
ClassLoader world that is accessed through reflection. A variant of the test 
should be able to show the problem that the patch we were talking about was 
trying to fix.

Original comment by emcma...@google.com on 12 Mar 2012 at 5:20

GoogleCodeExporter commented 9 years ago

I have spent some time investigating the problem and here is what I found.

I wrote the unit test I described, which creates a ClassLoader with a parallel 
copy of the Guava classes, makes a parallel FRQ, queues a 
FinalizableWeakReference against that queue, then drops the references to 
everything and waits for the ClassLoader to be GC'd. What I see is that the 
ClassLoader is successfully GC'd if there is no SecurityManager, but not if 
there is. Does this correspond to what people are seeing?

I abandoned the idea of adding a close() method because it did not help. Even 
in the SecurityManager case, I see that the FRQ instance is successfully GC'd 
and its associated Finalizer thread correctly terminates. I am not seeing a 
problem with instances of anything, but only with classes. There are two 
problems, in fact. One is the problem that the earlier patch fixes through 
reflection, namely that the URLClassLoader used to load the Finalizer class has 
a reference back to its creator ClassLoader through its captured 
AccessControlContext. But given that the FRQ *instance* has successfully been 
GC'd and its associated Finalizer thread has been terminated, this should not 
prevent class unloading because there are no longer any instances of classes 
from the Finalizer loader and no GC root should be referencing the classes 
themselves or their ClassLoader. So if nothing outside is referencing either 
the Guava loader or the Finalizer loader, it doesn't matter if the latter 
references the former. And that is exactly what I see when there is no 
SecurityManager.

The second problem I see occurs when there *is* a SecurityManager. 
java.lang.Thread has a field subclassAudits that is a "SoftCache". There have 
been memory leaks associated with this SoftCache before, for example with 
ResourceBundle and ObjectOutputStream. These have been fixed, but the Thread 
one has not. The SoftCache remembers the result of reflection on subclasses of 
Thread, to determine whether they override certain security-sensitive methods. 
It could and should be a WeakHashMap but it is not. So its entries are only 
cleared when the GC is strongly motivated to do so under memory pressure. That 
is typically not going to happen at the point where you want your webapp to be 
unloaded.

The subclassAudits business only happens when there is a SecurityManager, which 
explains why my test fails only in that case. Since Finalizer is a subclass of 
Thread, it is being remembered in the subclassAudits cache, and that is what is 
preventing FInalizer's ClassLoader from being GC'd. Because of the 
AccessControlContext problem that also prevents Guava's ClassLoader from being 
GC'd. Notice that even if we hack away the AccessControlContext problem, the 
Finalizer ClassLoader will still leak.

Fortunately there is a very simple fix to the subclassAudits problem. There is 
no reason why Finalizer has to be a subclass of Thread. It can just be a 
Runnable, and make a plain java.lang.Thread. (That is recommended practice 
anyway.) Plain java.lang.Thread does not have to be audited for method 
overrides so nothing is added to subclassAudits.

With that change my test passes even with a SecurityManager.

I'm attaching a patch with the changes to FRQ, Finalizer, and FRQTest. I would 
be very interested in hearing whether it solves the problems people have been 
seeing.

Original comment by emcma...@google.com on 21 Mar 2012 at 6:44

Attachments:

frq.patch

GoogleCodeExporter commented 9 years ago

Another benefit of Runnable (over extending Thread) is that it enables the use 
of custom Executors:

http://code.google.com/p/google-guice/issues/attachmentText?id=288&aid=-55992729
6892959590&name=GUICE_ISSUE_288_20110119.patch&token=Jv9dV_1_UgCI2Zb_7oQVxj46gp8
%3A1332314277994

which lets you truly decouple the execution of the background thread from the 
library.

Original comment by mccu...@gmail.com on 21 Mar 2012 at 7:31

GoogleCodeExporter commented 9 years ago

Interesting! Although I think my patch is the Simplest Thing That Could 
Possibly Work for this issue, we could later consider extending it, perhaps 
with an Executor parameter to FRQ rather than a system property. Were you able 
to observe whether your change alone was enough to fix the issue here?

Original comment by emcma...@google.com on 21 Mar 2012 at 11:19

GoogleCodeExporter commented 9 years ago

Wow Eamonn, terrific detective work!

We're trying to cut the first RC for Guava 12, so unless we receive more 
feedback today I'd favor getting your patch submitted.

Original comment by fry@google.com on 22 Mar 2012 at 3:42

GoogleCodeExporter commented 9 years ago

Following up #c88 ... I also prefer simple solutions, and while I suspect there 
may be a few rare corner-cases where even just using "new Thread(...)" can 
cause the spawned thread to inherit/leak context from the calling 
application/container, implementing runnable instead of extending thread should 
solve all the common issues people experience with FRQ. So +1 from me.

Original comment by mccu...@gmail.com on 22 Mar 2012 at 4:08

GoogleCodeExporter commented 9 years ago

http://code.google.com/p/guava-libraries/source/detail?r=b5b4b40a9c17484828dbd7c
eb28e0a468dad0b75

Original comment by cpov...@google.com on 24 Mar 2012 at 3:16

Changed state: Fixed

GoogleCodeExporter commented 9 years ago

To all: confirmation that this will solve your problem would be appreciated 
before the 12.0 release.

Original comment by kevinb@google.com on 26 Mar 2012 at 9:17

GoogleCodeExporter commented 9 years ago

Sorry this report is not before the 12.0 release, but I am still seeing this 
problem in 12.0:

  May 17, 2012 3:02:54 PM org.apache.catalina.loader.WebappClassLoader clearReferencesThreads
  SEVERE: The web application [/demo3] appears to have started a thread named [com.google.common.base.internal.Finalizer] but has failed to stop it. This is very likely to create a memory leak.

I'm using tomcat 7.0.21.

Original comment by archie.c...@gmail.com on 17 May 2012 at 8:05

GoogleCodeExporter commented 9 years ago

We are using Jetty 8.1.3 along with BoneCP which requires Guava. Even with 
Guava 12.0 the Finalizer thread does not stop running and prevents the garbage 
collection of the WebAppClassloader.

Original comment by jens.neh...@gmail.com on 23 May 2012 at 4:14

GoogleCodeExporter commented 9 years ago

Original comment by fry@google.com on 23 May 2012 at 6:41

Changed state: Accepted

GoogleCodeExporter commented 9 years ago

I can confirm this in Guava 12.0 on Sun's JDK 1.7.0_04 on a Windows 7 box.

I think I have traced this down to the thread that runs the 
com.google.common.base.internal.Finalizer.

Despite great pains to make sure that the Finalizer and its spawned thread use 
independant classloaders from the app's classloaders , it's contextClassLoader 
is still holding a reference to the module's classloader.

Original comment by techsy730 on 9 Jul 2012 at 6:42

GoogleCodeExporter commented 9 years ago

I've spent some more time investigating what happens with Tomcat specifically. 
If the FinalizableReferenceQueue is in a static field of some class in a 
webapp, then indeed the Finalizer thread never dies and the ClassLoader of the 
webapp is never garbage-collected. The reason is that the separate ClassLoader 
that is used to load the Finalizer class contains a reference to the original 
webapp ClassLoader in its saved AccessControlContext. That ClassLoader then 
keeps all the static fields of all of its classes alive, and in particular the 
FRQ. The Finalizer thread is set up to die as soon as its FRQ is 
garbage-collected but that never happens.

This can be fixed as previously suggested, by bashing the private 
URLClassLoader.acc field to null, but I think the cure is worse than the 
disease, and in particular could open security holes in environments with 
untrusted code.

In my tests, it is enough for the webapp just to set its static FRQ field to 
null and then the Finalizer thread will die when the no-longer-referenced FRQ 
is collected, and the webapp ClassLoader becomes collectible at that point. (In 
Tomcat, you can still get the alarming log message about the Finalizer thread 
still being present, but that is only because the FRQ hasn't been collected 
yet.)

I would be interested to know whether people who have run into this problem are 
able to work around it in a similar way.

I think we should probably add, as suggested earlier, a 
FinalizableReferenceQueue.close() instance method that explicitly stops the 
Finalizer thread, along with a test that this is enough to make the ClassLoader 
collectible.

Original comment by emcma...@google.com on 10 Jul 2012 at 8:04

GoogleCodeExporter commented 9 years ago

Hmm... in my case, I don't think I have any such static fields, at least none 
that directly reference a FRQ. There may be some indirect reference being 
created somehow.

FYI, here are all the guava classes that I'm explicitly importing:

import com.google.common.base.Function;
import com.google.common.base.Predicate;
import com.google.common.base.Predicates;
import com.google.common.cache.Cache;
import com.google.common.cache.CacheBuilder;
import com.google.common.cache.CacheLoader;
import com.google.common.cache.LoadingCache;
import com.google.common.collect.BiMap;
import com.google.common.collect.Collections2;
import com.google.common.collect.DiscreteDomains;
import com.google.common.collect.ImmutableBiMap;
import com.google.common.collect.Iterators;
import com.google.common.collect.MapMaker;
import com.google.common.collect.Ranges;
import com.google.common.util.concurrent.UncheckedExecutionException;

Also, I'm using BoneCP as well.. perhaps that is where the reference originates.

Original comment by archie.c...@gmail.com on 10 Jul 2012 at 8:14

GoogleCodeExporter commented 9 years ago

Consider how much time has been wasted on this issue (by developers and 
end-users) when we could have just added a close() method to begin with. Please 
end the pain :)

Original comment by cow...@bbs.darktech.org on 10 Jul 2012 at 8:15

GoogleCodeExporter commented 9 years ago

I worked around it by making my static reference to the 
FinalizibleReferenceQueue a weak reference. Then when I create each of my 
FinalizibleReferences, I have them keep a strong reference to the 
FinalizibleReferenceQueue that they are registerered with. When 
finalizeReferent() is called, that reference is nulled out. Thus, once all 
finalizible references are finalized, the FinalizibleReferenceQueue only has a 
weak reference, thus can be GCed, and thus the thread stopped.

I also have a method to fetch the FinalizibleReferenceQueue, where if the 
previous reference one collected (and thus, the reference cleared), it will 
create a new FinalizibleReferenceQueue, give that to the newly created 
FinalizibleReference, and then create a new WeakReference to that 
FinalizibleReferenceQueue and update my static reference to that. Moderately 
elegant. ;)

Something like that could probably be moved to inside the 
FinalizibleReferenceQueue (which the Finalizer thread will "listen" for its 
collection, as opposed to the FinalizibleReferenceQueue itself like it is now), 
and have each of the three FinalizibleReference reference implementations keep 
a hard reference, that is cleared upon the Finalizer "finalizing" it.

Original comment by techsy730 on 10 Jul 2012 at 8:22

yf0994 / guava-libraries

FinalizableReferenceQueue keeps ClassLoaders around. #92