oracle / graalpython

A Python 3 implementation built on GraalVM
Other
1.2k stars 103 forks source link

reuse context lead to jvm memory up #341

Closed rayduan closed 10 months ago

rayduan commented 1 year ago

hi ,when i reuse context run in Multithreading,the context receiver flied record so many threads , how can i clear it ,and is other thing i need to be care for? image


public class ContextPool {
    private static final int POOL_SIZE = 60;
    private static final int MAX_WAIT_TIME_MS = 20000;
    private final BlockingQueue<Context> pool = new LinkedBlockingQueue<>(POOL_SIZE);

    private static final String PYTHON = "python";
    private static final String PYTHON_PYTHON_PATH = "python.PythonPath";
    private static final String PYTHON_EXECUTABLE = "python.Executable";
    private static final String PYTHON_FORCE_IMPORT_SITE = "python.ForceImportSite";

    public ContextPool(String pythonExecutable, Engine engine) {
        for (int i = 0; i < POOL_SIZE; i++) {
            Context context = Context.newBuilder(PYTHON).engine(engine)
                    .allowExperimentalOptions(true)
                    .allowHostAccess(HostAccess.ALL)
                    .allowIO(true)
                    .allowNativeAccess(true)
                    .allowEnvironmentAccess(EnvironmentAccess.INHERIT)
                    .allowValueSharing(false)
                    .allowInnerContextOptions(true)
                    .option(PYTHON_EXECUTABLE, pythonExecutable).option(PYTHON_FORCE_IMPORT_SITE, "true").build();
            context.initialize(PYTHON);
            pool.add(context);
        }
    }

    /**

     *
     * @return {@link Context}
     * @throws InterruptedException 
     */
    public Context brrowContext() throws InterruptedException {
        return pool.poll(MAX_WAIT_TIME_MS, TimeUnit.MILLISECONDS);
    }

    /**
     *
     * @param context  
     * @param bindKeys 
    public void returnContext(Context context, List<String> bindKeys) {
        Value bindings = context.getBindings(PYTHON);
        bindKeys.forEach(bindings::removeMember);
        pool.offer(context);
    }
}
msimacek commented 1 year ago

I'm sorry, I don't understand what you're saying. "context receiver flied record so many threads" doesn't make any sense in English. Could you please formulate your problem clearly?

aevangelista81 commented 1 year ago

Hi guys, I'm experiencing a similar issue. In my specific case I'm reusing the same context to avoid to recreate and shutdown it at every single iteration, because it looks like a very expensive operation in terms of CPU load. So I'm going to invoke on the Value object that I obtain by the context.getBindings(LANGUAGE) method, the following piece of code at each iteration I need

Value result = value.execute(param1, param2, param3); return result.as(Map.class);

Once I'm done I execute with all my iterations:
 `context.close(true);`

This approach is going to increase the amount of memory at every single iteration, till I reach an OOM.

Screenshot 2023-06-30 at 20 10 11

Any advice on it? Thanks a lot Andrea Evangelista

aevangelista81 commented 1 year ago

@msimacek Can you provide me some advise about what has been reported in my previous comment? I'm sure I'm doing wrong or I'm missing something really important to avoid that issue. Thanks a lot

msimacek commented 1 year ago

Hi @aevangelista81 can you provide a bit more information about your code? Does it execute the same source each time? Does it use C extensions or third party modules? Can you see in the heap dump where are all the interop libraries (the top two objects in your screenshot) referenced from most often? Ideally if you could share a piece of code that reproduces the issue.

aevangelista81 commented 1 year ago

Hi @msimacek, thanks a lot for answer and sorry to be late. Attached here a simple java project that simulate the issue that I'm experiencing. Let me provide more info about it: In the main class, simply to get a quick memory increase, I wrapped the code in a while(true)

        while (true) {
            Map<String, Object> result = scriptingService.transform(Map.of("message", "message"), Map.of("header", "header"), Map.of("context", "context"));
            System.out.println(result);
        }

The issue is here: In the GraalVmPythonScriptingService.class the following methods are invoked at every iteration:

   @Override
    protected synchronized Map<String, Object> transformAux(Map<String, Object> body, Map headers, Map<String,Object> context) {
        try {
            var mbody = body == null ? new HashMap<>() : convertMutableMap(body, 0);
            var mheaders = headers == null ? new HashMap<>() : new HashMap<>(headers);
            var mcontext = context == null ? new HashMap<>() : new HashMap<>(context);
            Value result = value.execute(mbody, mheaders, mcontext);
            return convertPolyglotMap(result.as(Map.class));
        } catch (Exception e) {
            throw new RuntimeException(e);
        }
    }

    private Map<String, Object> convertPolyglotMap(Map<String, Object> as) {
        return as.entrySet().stream().collect(Collectors.toMap(entry -> entry.getKey(), entry -> {
            var value = entry.getValue();
            if (value instanceof Map<?, ?>) {
                return convertPolyglotMap((Map<String, Object>) value);
            }
            return value;
        }));
    }

Every time the result of result.as(Map.class) it's used like in my case (I'm just converting it from a PolyglotMap to a java.util.Map), it looks like that the InstrumentationHandler coming from com.oracle.truffle.api.instrumentation adds a new entry in loadedRoots.add(root); via void onLoad(RootNode root). Unfortunately these entries will never be removed and the application will reach an OOM.

Here is a screenshot with the heap dump Histogram Screenshot 2023-07-12 at 14 33 55

and a screenshot with the shortest path to GC

Screenshot 2023-07-12 at 14 39 06

TestGvmPython.zip

This is my environment:

java -version
java version "17.0.7" 2023-04-18 LTS
Java(TM) SE Runtime Environment Oracle GraalVM 17.0.7+8.1 (build 17.0.7+8-LTS-jvmci-23.0-b12)
Java HotSpot(TM) 64-Bit Server VM Oracle GraalVM 17.0.7+8.1 (build 17.0.7+8-LTS-jvmci-23.0-b12, mixed mode, sharing)
 gu list
ComponentId              Version             Component name                Stability                     Origin 
---------------------------------------------------------------------------------------------------------------------------------
graalvm                  23.0.0              GraalVM Core                  Supported                     
espresso                 23.0.0              Java on Truffle               Experimental                  gds.oracle.com
icu4j                    23.0.0              ICU4J                         Supported                     gds.oracle.com
llvm                     23.0.0              LLVM Runtime Core             Supported                     gds.oracle.com
llvm-toolchain           23.0.0              LLVM.org toolchain            Supported                     gds.oracle.com
native-image             23.0.0              Native Image                  Early adopter                 
python                   23.0.0              GraalVM Python                Experimental                  gds.oracle.com
regex                    23.0.0              TRegex                        Supported                     gds.oracle.com

Thanks again for your time, and I hope you can provide me some advice to avoid this issue. Andrea Evangelista

msimacek commented 1 year ago

@timfel, you were recently fixing a memory leak in map interop, could you please check if the above is the same problem?

timfel commented 1 year ago

This is the same issue we're fixing in https://github.com/oracle/graal/pull/6982

aevangelista81 commented 1 year ago

Thanks a lot guys

aevangelista81 commented 1 year ago

Morning @timfel, The fix has not been released yet right? Thanks a lot Andrea Evangelista

timfel commented 1 year ago

Morning @timfel, The fix has not been released yet right? Thanks a lot Andrea Evangelista

No, it's in master, but not in a release. The next release is due out sometime in September

timfel commented 10 months ago

Fixed in release, fix was also backported