Open luxel opened 2 years ago
I don't know if we'll be able to solve this or not. A couple of questions:
getValue(String)
literally only returning a value from Python to Java, or does it have some computation in it? For example, interp.getValue("x")
vs interp.getValue("calculateX()")
.Thank you for the reply!
Unfortunately we don't have the source code for the Python part. P.s the Python libraries we use are.pyd libraries built with Cython.
Unfortunately your use case looks quite complicated and I have not seen any similar reports so I cannot offer much guidance.
Your fix is currently the most perplexing part for me. I cannot think of anything that would change after handling a few requests in java which would make Jep more likely to crash. Do your other requests involve other native libraries or is it mostly java? There are a few places where jep interacts with the thread classloader, these have never caused crashes in the past but I wonder if maybe the other requests aren't affecting the class loader?
It might be helpful if you could split up your calls to jep, try to move the math computations into an exec and store the result as a variable you access with getValue. Most crashes are caused by third party libraries running into an unexpected environment, which would crash in the exec portion but when getValue is converting a map to java there is alot of jep code executing, so if you could prove whether the crash is coming from computation vs jep conversion that might provide some insight on the problem.
If I understand correctly, you are never closing any interpreter? Do your threads ever complete so that an interpreter becomes inaccessible? I'm not aware of any problems this would cause but it would be an interesting state to be in.
Would it be possible for you to open, use, and then close a new SharedInterpreter for every request. Since the sys.modules are shared between interpreter each import after the first should be a simple dict lookup and not take noticeable time.
I apologize that most of my ideas are just fishing for information but I do not have anything else to offer.
Describe the problem A clear and concise description of what the problem is.
We have built a web server with spring boot, serving API with HTTP requests. Some of the requests use JEP, and the rest of them don't. We're using ThreadLocal variable to hold ShareInterpreter for each thread, and never close those instances.
We have found a strange crash behavior:
If a thread (let's say "XNIO-1 task-5") was first created to serve other HTTP requests which doesn't involve JEP (which doesn't initialize the ShareInterpreter in ThreadLocal), and when the second time, if the same thread "XNIO-1 task-5" is re-used for a request which triggers the initialization of ShareInterpreter, it crashes when we trying to invoke some python methods (but not on all methods) and gets the return value.
If a thread (let's say "XNIO-1 task-6") was first created to serve a request which involve JEP (which initializes the ShareInterpreter immediately), everything was ok. And if the same thread "XNIO-1 task-6" is used for the second or third time, it’s still working as expected.
Our temporary workaround - we created a filter which intercepts every request, and ensure the SharedInterpreter ThreadLocal is initialized for each thread when it’s created.
Could any one help us figure out what’s behind the scene and is there any better solution?
Environment (please complete the following information):
Example crash log: