Closed eduveks closed 3 years ago
After more 20 minutes looks like this:
@eduveks thanks for reporting the issue we will check it and get back to you
@eduveks thanks for the report!
Are you reloading classes e.g. using a custom host class loader or some application server? It looks to me as if our host member/method lookup cache seems to be missed a lot.
If this is true then the current behavior might be not ideal but is expected. We currently do reference bound host methods strongly. A workaround is to recreate the Engine (or Context if you don't use an explicit Engine) to refresh that cache. Note that this also refreshes all the compiled code. If this turns out to be the case, then I could have another look on improving this, if you provide a small example of your usage pattern.
If you are not reloading classes in your host application then it would be immensely helpful if you could provide a sample application that reproduces this leak reliably or, less preferrably, a heap dump (hprof) file of the leak.
Thanks!
Hello @chumer,
I did another round of stress test, with screenshots below and here the hprof file made a few minutes ago:
Yes, it is an application server, the Netuno platform, with a Jetty server inside and runs over GraalVM.
You can check out Netuno install is straightforward, only need to execute one command in the terminal.
The installation will download and install GraalVM too inside the netuno folder.
Then here an example of how to create services.
NodeJS inside GraalVM execute the javascript server-side (services), then the script code like _out.json
, _db.query
and much more, they will call Java methods.
Each request executes multiples scripts (JavaScript) over the same Class that initializes the HostAccess, ContextBuilder, and then the Context, but in the same request may create more than one Context. But always all Contexts initialized will be closed in the end of the request with context.leave()
and context.close(true)
.
Multi-threads do not use the same HostAccess
and ContextBuilder
, each thread/request has your own org.graalvm.polyglot.*
instances.
If you need more details just say.
Thank you!
I downloaded the heap dump and computed the retained size and the retained size looks like this:
So it looks like if all of the references can actually be freed. So I don't think there is anything leaking in that heap dump. Why you are seeing the GC pressure that you are seeing, I cannot really explain. Did you try forcing a full GC with System.gc form time to time?
Each request executes multiples scripts (JavaScript) over the same Class that initializes the HostAccess, ContextBuilder, and then the Context, but in the same request may create more than one Context. But always all Contexts initialized will be closed in the end of the request with context.leave() and context.close(true).
Two remarks here:
[1] https://www.graalvm.org/reference-manual/embed-languages/#code-caching-across-multiple-contexts
I think that I already tested the implementation with code caching across multiple contexts in the past.
But yesterday, I did some tests changing the implementation to supports the same Engine across all threads and with one Engine per thread.
The first impression in both cases was memory ate fast and very slow performance, scripts executed before in 300 milliseconds jumped to 20 seconds, CPU with high usage too.
I will try to do more tests within the next few days to try to give more details.
Yes, periodically, we force the garbage collector with:
System.gc();
System.runFinalization();
Sample of the first results with the Engine:
Any chance you can give me something runnable that reproduces this problem?
Any chance you can give me something runnable that reproduces this problem?
Yes, I think. I don't have time today, but I will give you more news in the next few days.
I think I know what is going on here. I was investigating, why there are so many instances of java.lang.reflect.Method
in the heap dump. I tracked almost all the references to various instances of java.lang.ClassValue$ClassValueMap and those instances does not have any references so it looks like they can be collected by GC, but instances of java.lang.ClassValue$ClassValueMap
are references from java.lang.Class.classValueMap
field so they are held by the particular class, but this information is not dumped to hprof file. :-(
The question is why java.lang.ClassValue$ClassValueMap
holds so many java.lang.reflect.Method
. I don't know. If you look at the source code of ClassValueMap, there is some caching, so it might be one reason. I also saw this: https://stackoverflow.com/questions/62593886/java-memory-leak-why-are-unreachable-objects-not-collected This looks very similar to our case and in addition it uses scripting API to call javascript.
I have netuno
server running with some demo application and just randomly clicking pages, reloading pages etc., I am able to see steady rise of instances of Method
.
This is example of references from Method
to ClassValueMap
.
Small heap dump for investigation: GH-3260_heapdump.zip .
Steps to reproduce:
netuno
in /tmp
thurka$ cd /tmp
thurka$ mkdir -p netuno && cd netuno && curl https://www.netuno.org/netuno.jar > netuno.jar && java -jar netuno.jar install
node
to PATH
thurka$ export PATH=/tmp/netuno/graalvm/bin/:$PATH
netuno
server with the default application via ./netuno server
Sampler
tab and select Memory
samplingLive Object
java.lang.reflect.Method
instancesShow delta values
button to see deltas in live object countGC
directly from Memory
sampler view to free unreferenced instancesI did not yet run your code yet, but I think I found a problematic pattern in our code:
public class Test {
/*
* Change to false to avoid the bug.
*/
private static boolean CYCLIC_REFERENCE = true;
private static class HeavyValue {
final Context context;
final byte[] bytes = new byte[1024 * 16];
HeavyValue(Context context) {
// the context assignment
if (CYCLIC_REFERENCE) {
this.context = context;
} else {
this.context = null;
}
}
}
private static class Context {
final ClassValue<HeavyValue> values = new ClassValue<Test.HeavyValue>() {
@Override
protected HeavyValue computeValue(Class<?> type) {
return new HeavyValue(Context.this);
}
};
}
static class TestClass1 {
}
static class TestClass2 {
}
static class TestClass3 {
}
static class TestClass4 {
}
static class TestClass5 {
}
static class TestClass6 {
}
private static final Class<?>[] TEST_CLASSES = new Class[] {
TestClass1.class,
TestClass2.class,
TestClass3.class,
TestClass4.class,
TestClass5.class,
TestClass6.class
};
private static int test() {
Context context = new Context();
int hash = 31;
for (Class<?> testClass : TEST_CLASSES) {
hash += context.values.get(testClass).bytes.length;
}
return hash;
}
public static void main(String[] args) {
int hash = 31;
for (int i = 0; i < 100000; i++) {
hash += test();
}
System.out.println(hash);
}
}
Switching that cyclic reference flag to false fixes the problem (tested on latest JDK 8 on Mac). But this should really not happen, as such cyclic references should be resolvable by the GC. I will double check and maybe file a JDK bug. We probably need to deploy a workaround for this.
I need to do some more digging to find out what is going on.
Turns out this is a known issue: https://bugs.openjdk.java.net/browse/JDK-8169425
It looks like every page reload in the web app adds several instances of com.oracle.truffle.polyglot.HostClassCache
.
I could reproduce it with a simple example now locally.
It looks like there is a simple workaround. You need to reuse the same HostAccess
instance for all contexts.
I am working on a fix that avoids a the Cyclic dependency.
Fix is in the pipeline. Will backport this to 21.1 and probably also 20.3 (need to double check)
Thank you, @thurka, for your inputs, and I'm happy you could found the issue with Netuno.
@chumer the workaround reusing the HostAccess
across all contexts does not affect overall performance, but in this case, the memory leak still happens. See the print screen below:
To complement the @thurka tip to run locally, if install and start Netuno just with ./netuno server
based on the default configs will run the demo app, then it is possible to execute Apache BenchMark (web stress) over one of these services:
ab -n 100 -c 5 http://127.0.0.1:9000/services/samples/javascript/print-template
ab -n 100 -c 5 http://127.0.0.1:9000/services/samples/javascript/date-format
ab -n 100 -c 5 http://127.0.0.1:9000/services/samples/javascript/query-result
The first one, print-template
, seems to provoke a higher peak.
This test with Apache BenchMark and the demo application are very different from all tests I had reported before, but I think it is over the same internal issue.
My results with Apache BenchMark executed over 5 times:
@eduveks This is really weird, in my local tests using the same HostAccess instance for all contexts resolved the problem. As the cache is bound and reused for HostAccess instances. So it should be impossible to have the same method loaded twice. Are you sure you are not just building an equal HostAccess instance? To make sure, can you read the HostAccess instance from a static final? Maybe you can drop a snippet how you are doing that atm, so I can plug it into my test?
@chumer I did another test to ensure and is the same behavior with HostAccess
as static.
This is the current full code of the integration with GraalVM:
package org.netuno.psamata.script;
import org.graalvm.polyglot.*;
import org.graalvm.polyglot.Context.Builder;
import java.util.*;
public class GraalRunner {
private static boolean graal = false;
//private static Engine engine = Engine.create();
private static HostAccess hostAccess = null;
private Builder contextBuilder = null;
private Context context = null;
private List<Context> oldContexts = new ArrayList<>();
static {
try {
Context.create().close();
hostAccess = HostAccess.newBuilder()
.allowPublicAccess(true)
.targetTypeMapping(
Value.class, Object.class,
(v) -> v.hasArrayElements(),
(v) -> transformArray(v)
).targetTypeMapping(
Value.class, List.class,
(v) -> v.hasArrayElements(),
(v) -> transformArray(v)
).targetTypeMapping(
Value.class, Collection.class,
(v) -> v.hasArrayElements(),
(v) -> transformArray(v)
).targetTypeMapping(
Value.class, Iterable.class,
(v) -> v.hasArrayElements(),
(v) -> transformArray(v)
)
.build();
graal = true;
} catch (Throwable t) {
graal = false;
}
}
public GraalRunner(String... permittedLanguages) {
init(null, permittedLanguages);
}
public GraalRunner(Map<String, String> options, String... permittedLanguages) {
init(options, permittedLanguages);
}
private void init(Map<String, String> options, String... permittedLanguages) {
//engine = Engine.create();
contextBuilder = Context.newBuilder(permittedLanguages)
.allowExperimentalOptions(true)
.allowIO(true)
.allowAllAccess(true)
.allowCreateThread(true)
.allowHostAccess(hostAccess);
//https://www.graalvm.org/truffle/javadoc/org/graalvm/polyglot/ResourceLimits.html
//contextBuilder.resourceLimits(new ResourceLimits())
if (options != null) {
contextBuilder.options(options);
}
//contextBuilder.engine(engine);
context = contextBuilder.build();
}
public void resetContext() {
if (context != null) {
oldContexts.add(context);
}
context = contextBuilder.build();
}
public void close() {
for (Context oldContext : oldContexts) {
try {
oldContext.leave();
} catch (Exception e) { }
oldContext.close();
}
oldContexts.clear();
oldContexts = null;
try {
context.leave();
} catch (Exception e) { }
context.close();
context = null;
//engine.close();
}
public static boolean isGraal() {
return graal;
}
public GraalRunner set(String language, String var, Object value) {
context.getBindings(language).putMember(var, value);
return this;
}
public Set<String> keys(String language) {
return context.getBindings(language).getMemberKeys();
}
public Object get(String language, String var) {
return toObject(context.getBindings(language).getMember(var));
}
public boolean getBoolean(String language, String var) {
return context.getBindings(language).getMember(var).asBoolean();
}
public byte getByte(String language, String var) {
return context.getBindings(language).getMember(var).asByte();
}
public double getDouble(String language, String var) {
return context.getBindings(language).getMember(var).asDouble();
}
public float getFloat(String language, String var) {
return context.getBindings(language).getMember(var).asFloat();
}
public int getInt(String language, String var) {
return context.getBindings(language).getMember(var).asInt();
}
public long getLong(String language, String var) {
return context.getBindings(language).getMember(var).asLong();
}
public int getShort(String language, String var) {
return context.getBindings(language).getMember(var).asShort();
}
public String getString(String language, String var) {
return context.getBindings(language).getMember(var).asString();
}
public Value eval(String language, String code) {
try {
context.enter();
} catch (Exception e) { }
try {
return context.eval(language, code);
} finally {
try {
context.leave();
} catch (Exception e) { }
}
}
public static Map transformMembers(Value v) {
Map map = new HashMap();
for (String key : v.getMemberKeys()) {
Value member = v.getMember(key);
if (member.hasArrayElements() && !member.isHostObject()) {
map.put(key, transformArray(member));
} else if (member.hasMembers() && !member.isHostObject()) {
map.put(key, transformMembers(member));
} else {
map.put(key, toObject(member));
}
}
return map;
}
public static List transformArray(Value v) {
List list = new ArrayList();
for (int i = 0; i < v.getArraySize(); ++i) {
Value element = v.getArrayElement(i);
if (element.hasArrayElements() && !element.isHostObject()) {
list.add(transformArray(element));
} else if (element.hasMembers() && !element.isHostObject()) {
list.add(transformMembers(element));
} else {
list.add(toObject(element));
}
}
return list;
}
public static Object toObject(Value v) {
if (v.isNull()) {
return null;
} else if (v.isHostObject()) {
return v.asHostObject();
} else if (v.isProxyObject()) {
return v.asProxyObject();
} else if (v.isBoolean()) {
return v.asBoolean();
} else if (v.isNumber()) {
if (v.fitsInByte()) {
return v.asByte();
}
if (v.fitsInShort()) {
return v.asShort();
}
if (v.fitsInInt()) {
return v.asInt();
}
if (v.fitsInLong()) {
return v.asLong();
}
if (v.fitsInFloat()) {
return v.asFloat();
}
if (v.fitsInDouble()) {
return v.asDouble();
}
} else if (v.isString()) {
return v.asString();
} else {
Value value = v.getMetaObject();
if (value.fitsInByte()) {
return value.asByte();
}
if (value.fitsInShort()) {
return value.asShort();
}
if (value.fitsInInt()) {
return value.asInt();
}
if (value.fitsInLong()) {
return value.asLong();
}
if (value.fitsInFloat()) {
return value.asFloat();
}
if (value.fitsInDouble()) {
return value.asDouble();
}
try {
return value.asBoolean();
} catch (Exception e) {
try {
return value.asString();
} catch (Exception ex) {
return null;
}
}
}
return null;
}
}
Any reason why you are doing:
try {
context.enter();
} catch (Exception e) { }
try {
return context.eval(language, code);
} finally {
try {
context.leave();
} catch (Exception e) { }
}
With enter/leave? Any context operation should do an implicit enter/leave. In this case doing the enter and leave explicitely is actually slower.
Missing a leave on a particular thread can also potentially cause another leak. Is there any multi-threading happening for a GraalRunner instance?
@chumer, my apologies, I detect a maven malfunction that did not change my code version in all of my tests. 🤦♂️
Finally, it is correct. Your workaround sharing the HostAccess works very well!
Sorry for my mistake.
This is the result of many stress, my hardest test, rolling smooth: 🥳
Thank you very much! 🍻
To clarify, I commented all context.enter()
and context.leave()
. I do not remember why but I think it was an attempt to avoid the leak.
Yes, just one instance of the GraalRunner
per thread always.
And again, thank you for your support.
With this, I remembered making another test sharing the same Engine with all threads, as @chumer said.
And the performance jumped to another level, much less memory and CPU consumption overall, even with a lot of stress is like nothing is going on.
Then the key is to share Engine and HostAccess over all threads. 🚀
Finally, the killer result:
Happy to hear it all works out now as intended! :-)
Hello,
I'm facing memory leak of many LinkedHashMap with GraalVM.
This behaviour is not new because in the last years I am using GraalVM always seeing memory issues, but now is more critical because we are using in situations with much more stress overall.
Anyway GraalVM is awesome, thank you a lot by all your work. 👏
Hope this report can help to improve.
Describe GraalVM and your environment :
java -Xinternalversion
: