Closed mario45211 closed 1 year ago
Yes, you can reuse a context to make multiple calls. Yes, memory leaks are possible if you add things and not remove them (also on the python side). In a multithreaded environment, this context would have a GIL, which means that only one eval
call will be able to run at a time, the rest will have to wait for the first one to finish. A better way is to use a shared engine and create multiple contexts from a single engine. It will cache the compiled code (if you eval the same python code it would just get the compiled code from the cache, without parsing it again), so creating additional contexts out of the engine should become a relatively cheap operation once the engine warmed up a bit. Those contexts will have separate GILs, so your threads won't block each other. There will also be lower risk of memory leaks or other resource leaks.
Thanks for your reply.
I rested your suggested approach vs (mine) full context caching.
Repeating 500x the same test I noticed creating new context each time (with the same engine of course) gives ~50-60 ms overhead on the eval
call. To be fair I skipped first 50 iteration, which are treat as a warmup.
Test code using shared engine but new context on each run:
try (var context = Context.newBuilder()
.engine(engine)
.allowHostAccess(HostAccess.ALL)
.allowPolyglotAccess(PolyglotAccess.ALL)
.build()) {
var tStart = System.nanoTime();
context.eval(Source.newBuilder("python", SCRIPT_WITH_FN_CONST, "name").build()); // ~50-60 ms overhead
var fn = context.getBindings("python").getMember("fn");
for (var i = 0; i < 100; ++i) {
fn.execute(arg);
}
var tEnd = System.nanoTime();
}
Test code with shared context looks the same, but Context.create()
, eval()
and getMember()
are cached in the class's fields.
Can you notice any flaw in this code or that overhead is somehow expected in cost of new context ?
You could try to create the Source
just once (cache the builder result), but I'm not sure it would help much. There is always overhead of creating a python context, because python has to execute a lot of stdlib modules on startup before your code executes. CPython also takes ~40ms to start, so I don't think we can get much better than that. If the overhead is a problem for you, you can use the thread-local context approach you suggested. I would still recommend using a shared engine for the contexts to save memory.
Many thanks for your replies - that cleares all my concerns now.
看起来相同,但是
Context.create()
,eval()
并且getMember()
缓存在类的字段中。Thanks for your reply.
I rested your suggested approach vs (mine) full context caching. Repeating 500x the same test I noticed creating new context each time (with the same engine of course) gives ~50-60 ms overhead on the
eval
call. To be fair I skipped first 50 iteration, which are treat as a warmup.Test code using shared engine but new context on each run:
try (var context = Context.newBuilder() .engine(engine) .allowHostAccess(HostAccess.ALL) .allowPolyglotAccess(PolyglotAccess.ALL) .build()) { var tStart = System.nanoTime(); context.eval(Source.newBuilder("python", SCRIPT_WITH_FN_CONST, "name").build()); // ~50-60 ms overhead var fn = context.getBindings("python").getMember("fn"); for (var i = 0; i < 100; ++i) { fn.execute(arg); } var tEnd = System.nanoTime(); }
Test code with shared context looks the same, but
Context.create()
,eval()
andgetMember()
are cached in the class's fields.Can you notice any flaw in this code or that overhead is somehow expected in cost of new context ?
my getBindings method cost 7718 ms , how did you cost 50ms?
@rayduan try to measure warmed up invocation
@rayduan try to measure warmed up invocation
when i use site true , it cost too long ,i cache the context like this
public static ConcurrentLinkedHashMap<String, Context> contextCache = new ConcurrentLinkedHashMap.Builder<String, Context>() .maximumWeightedCapacity(156).weigher(Weighers.singleton()).listener( (key, value) -> { try { value.close(); } catch (Exception e) { log.error("close context error"); } finally { value.close(); } } ).build();
private Context createContextBuilder(ScriptRunContext context) { //将脚本转为md5 String scriptMd5DStr = SecureUtil.md5(context.getScriptContent()); if (contextCache.containsKey(scriptMd5DStr)) { return contextCache.get(scriptMd5DStr); } String modulePath = initModulePath(context); Context.Builder contextBuilder = Context.newBuilder(PYTHON).option(PYTHON_FORCE_IMPORT_SITE, "true").allowAllAccess(true).engine(engine) .option(PYTHON_EXECUTABLE, customizeConfig.getPythonExecutable()); if (!CollectionUtils.isEmpty(context.getRelatedPackages())) { contextBuilder.option(PYTHON_PYTHON_PATH, modulePath); } Context runContext = contextBuilder.build(); contextCache.put(scriptMd5DStr, runContext); return runContext; }
but after a time it cause OOM, i have no idea
Sorry, I can't help tracing OOM and I bet it's beyond this issue...
Hi,
I'm testing embedding Python interpreter inside a Java application (Jython replacement) and I reached the best performance when
Context
instance was reused between particular code's execution. My usage case compiles stateless Python function and call it usingexecute
API with arguments in a multiprocessing environment (many concurrent calls).The question is: is it safe to reuse
Context
instances by caching them between calls? While this make that instance long-living object, should I be concern of possible memory leak, e.g. when I callputMember
and then forgotremoveMember
after execution ? (not my use case but asking anyway).I could not find any resource explaining context sharing besides of the JavaDoc of
Context
: Whether a single context instance may be used from multiple threads at the same time depends on if all initialized languages support it - is it not relevant for Python language (because of GIL) ? If yes, is putting context intoThreadLocal
could be sufficient workaround ?Thanks for any help in advance!