Closed lprimak closed 2 months ago
You are going to need to provide a test case that demonstrates a memory leak. And once we have such a test case the fix will likely be to address the memory leak rather than adding a method that clears the cache.
Hi, Mark,
I modified the description to show exactly what's happening. Also uploaded screenshot from VisualVM. It's clear that GC isn't collecting ClassLoaders while they are in the cache. The only option is to clear the cache.
I'm still not seeing the evidence of a memory leak. Soft references are GC eligible. For a memory leak I'd expect to see a chain of hard references from the leaked object to a GC root.
SoftConcurrentHashMap.map
contains Class<?>
ask key
, which in turn has a hard reference to the ClassLoader
BeanProperties
also has a hard reference to Class<?>
I have done a number of tests and the method in the PR is the minimum necessary to avoid the memory leak.
@markt-asf Any thoughts please? Thank you
There is no memory leak here (confirmed with a profiler). There is nothing to fix.
Absent a test case demonstrating an actual memory leak, this PR and the associated issue will be closed.
Hi, Mark,
The memory leak is clearly demonstrated by the VisualVM screenshot and my comments above. I spent 10s of volunteer hours tracking this down, fixing it and verifying the fix works. Is there some downside to this code that I am not seeing? The "worst-case outcome" of this fix is that it does nothing. Isn't that a good enough reason to merge this in?
This way, if I am right about this, the fix is already there :)
Any help convincing Mark would be appreciated @smillidge @arjantijms @edburns @ivargrimstad Thank you
Hi, Mark,
@edburns would like me to produce a video with the memory leak. Would that be helpful? Do you have any questions or suggestions? Is there anything above that's not clearly demonstrating the memory leak?
Thank you
I would very much prefer a test case that demonstrates a leak - a leak being a hard reference to an object that isn't eligible for GC that should be eligible for GC.
The information provided so far only shows a reference chain that contains a soft reference to the alleged leaking class. That is not a memory leak.
Based on the information provided so far, I have been unable to recreate a memory leak via use of the BeanELResolver.
My experience of videos (mainly in security vulnerability reports) is not good. The issues are:
You are of course free to provide a video if you wish and it will get looked at. However, I continue to believe the most effective way for you to demonstrate that there is an issue is to provide a test case that demonstrates the leak. I appreciate that creating unit tests for memory leaks is not always easy. The simplest possible web application (with source code) that demonstrates a leak that can be observed in a profiler would be sufficient. Given that I don't see how you can provide a video without first writing such a test case, my recommendation continues to be "provide the test case".
Further:
The proposed fixed is highly unlikely to be applied in any circumstances. Absent evidence of a memory leak, cruft is not going to be added to the API "just in case". If evidence of a memory leak is provided, then the fix is almost certainly going to be to fix the actual leak - not to provide a hack to work-around the leak.
@markt-asf Would a Java process memory dump (e.g. hprof file) help demonstrate the leak? I personally prefer a memory dump over a picture or a video.
@markt-asf Looks like you have tried to reproduce the memory leak yourself. Can you put that code up so I can take a look? Maybe I can modify it to demo the leak rather than start from scratch? Thank you
Also, you claim that all references are soft.
static private class SoftConcurrentHashMap extends ConcurrentHashMap<Class<?>, BeanProperties> {
the above line has a hard key to Class<?>
Why do you think this will be garbage collected?
Why do you think this will be garbage collected?
Because I have read the code - particularly the cleanup()
method and when it is called - and tested that it behaves as expected and clears both the key and the value from the cache.
The test code I have been using:
package org.apache.markt;
import jakarta.el.BeanELResolver;
import jakarta.el.ELContext;
import jakarta.el.ELResolver;
import jakarta.el.FunctionMapper;
import jakarta.el.VariableMapper;
public class TestBeanELResolver {
/*
* Need to run this with the following:
* -Xmx16m
* -XX:SoftRefLRUPolicyMSPerMB=100
*/
public static void main(String[] args) throws Exception {
@SuppressWarnings("unused")
Object o = new Object();
// Create the BeanELResolver instance to be tested
BeanELResolver r = new BeanELResolver();
// Use the resolver with TesterBeanA. This adds TesterBeanA to BeanELResolver.properties
resolveA(r);
/*
* Now sleep for longer than heap size in MB (16) * SoftRefLRUPolicyMSPerMB (100). This ensures that, when there
* is memory pressure, any soft references that exist at the moment - and specifically the ones in
* BeanELResolver.properties - will be cleared.
*/
Thread.sleep(2000);
/*
* Create memory pressure.
*
* A call to System.gc() is *not* sufficient to clear the soft references.
*/
o = null;
try {
o = new int[10][10][10][10][10][10][10][10][10][10][10][10];
} catch(OutOfMemoryError err) {
// Ignore
}
/*
* At this point the soft reference has been removed from the map and placed on the queue. Trigger the cleanup()
* method. Do this by using the resolver with a different bean.
*/
resolveB(r);
/*
* Placing a break point here and examining the heap with a profiler will show that references to TesterBeanA
* have been removed from BeanELResolver.properties
*
* Note that the profiler will still show a hard reference from the class loader to TesterBeanA since the
* class loader retains a reference to all the classes it has loaded.
*/
System.out.println("debug point");
}
private static void resolveA(ELResolver r) {
TesterBeanA b = new TesterBeanA();
b.setData("test");
System.out.println(r.getValue(new TestContext(), b, "data"));
}
private static void resolveB(ELResolver r) {
TesterBeanB b = new TesterBeanB();
b.setData("test");
System.out.println(r.getValue(new TestContext(), b, "data"));
}
private static class TestContext extends ELContext {
@Override
public ELResolver getELResolver() {
return null;
}
@Override
public FunctionMapper getFunctionMapper() {
return null;
}
@Override
public VariableMapper getVariableMapper() {
return null;
}
}
public static class TesterBeanA {
private String data;
public String getData() {
return data;
}
public void setData(String data) {
this.data = data;
}
}
public static class TesterBeanB {
private String data;
public String getData() {
return data;
}
public void setData(String data) {
this.data = data;
}
}
}
Looking at your test, I can see why you would think it isn't a leak. However, it absolutely is a leak for all intents and purposes:
The test above introduces memory pressure before forcing cleanup()
to be called, thus it does clean out the queue (and the attached ClassLoader)
However, this is not a realistic situation, as memory pressure usually happens after the cleanup()
method gets called, not before.
Memory pressure stays active, without an additional beans being resolved, thus cleanup()
doesn't get called often enough to clean up the cache and thus is ineffective at cleaning up the class loaders.
This will cause OOM if during application runtime, application will ask for more memory, without resolving additional beans and thus calling cleanup()
, which is what I was observing in my real-world tests.
Saying all that, WeakReference
solution will probably work in this case, and is probably better solution than the manual cleanup proposed here.
However, I'd like to keep this PR open until the WeakReference
method gets incorporated and proven to work in real-world tests.
The risk of WeakReference
implementation is that it would be purged from the cache too early and would actually make the cache ineffective. I do not believe that will be the case, but without testing, who knows?
This PR really needs to be closed and this discussion moved to the issue.
It does seem unlikely to me that there would be enough bean usage to cause notable memory usage, then some other activity that triggers GC and then no further bean usage until a OOME occurs. Unlikely, but I guess it is possible. I do wonder if some other memory leak was at play as well.
There are a couple of options here.
The simplest may be to switch using class name rather than class as the key. That would reduce the impact of what was left behind before cleanup() is called.
Switching to a WeakReference based cache of some form is probably the better solution although it is higher risk. The Tomcat implementation has been used successfully since Java EE 5 but I'd need to check on where it originated. I know I didn't write it so I can't just copy it.
I’ll make another PR with WeakReference as the change should be trivial
It does seem unlikely to me that there would be enough bean usage to cause notable memory usage, then some other activity that triggers GC and then no further bean usage until a OOME occurs. Unlikely, but I guess it is possible. I do wonder if some other memory leak was at play as well.
The bean usage isn't the issue. The issue is that stale beans from undeployed applications drag in their ClassLoaders thus creating a huge leak. Beans are usually created and cached during application deployment. They are supposed to be unloaded during application undeployment. However, because they remain in the cache, and there is no memory pressure as-of yet, they do not get cleaned up. After a number of deploy/undeploy cycles, memory usage gets close to maximum, but not enough to clean up SoftReferences. After that, during regular application runtime it allocates memory, and because at some point it cannot, OOM occurs.
Even before that point, GC and memory thrashing occurs because there is a lot of uncollectable classes and garbage remaining.
What I am describing above is not a hypothetical. It has been observed and recorded via JFR, VisualVM, JMC, etc.
Solution is this PR is also not a hypothetical. With his applied, the OOM/GC thrashing issues are gone. Also observed by JFR/JMC/VisualVM
The simplest may be to switch using class name rather than class as the key. That would reduce the impact of what was left behind before cleanup() is called.
This will not work, because the cached bean (map value) contains a reference to application's object, which in turn will refer to it's ClassLoader, preventing it from being cleaned up.
Superseded by #248
to BeanELResolver to help remove ClassLoader leaks
fixes #214
Relates to https://github.com/payara/Payara/pull/6677