jpype-project / jpype

JPype is cross language bridge to allow Python programs full access to Java class libraries.
http://www.jpype.org
Apache License 2.0
1.12k stars 181 forks source link

Advice: Threading and JPype - JVM never ends #1169

Open pelson opened 9 months ago

pelson commented 9 months ago

I've read the docs on threading and JPype https://jpype.readthedocs.io/en/latest/userguide.html#threading. In a context where the thread can do arbitrary work (e.g. it is in a pool, or runs an asyncio loop) it isn't clear how to know when to call java.lang.Thread.detach().

To give the most basic example:

import threading

def main():
    import jpype as jp
    jp.startJVM()

t = threading.Thread(target=main, name='jvm-starter')
t.start()

The result will be a never-ending process, even though the main thread finishes the program, and the jvm-starter thread has ended.

As documented, adding the jp.java.lang.Thread.detach() call in the main function results in the process ending correctly. The problem is that it is not always obvious when to actually make this call. Take the following example:

import multiprocessing.pool
import threading

def main():
    import jpype as jp
    if not jp.isJVMStarted():
        jp.startJVM()
    # We should be detaching here

t = multiprocessing.pool.ThreadPool(1)
f1 = t.apply(main)
f2 = t.apply(main)

t.close()
t.join()

In this example, the detach call is fairly easy to do, but in general JPype may be very deeply nested, and the underlying use might not even know that it is going to be called outside of the main thread.

Therefore, is it the case that wherever you use JPype, you must also always call detach when you are done in order for your code to reasonably work in a thread other than the main one?

This behaviour changed a few versions ago (looks like between 1.3 and 1.4) - in the past, I understood that this was automatically handled (and can imagine it was flaky, buggy, and costly). Some historical docs on threading behaviour https://jpype.sourceforge.net/doc/user-guide/userguide.html#python_threads.

This could also be the problem encountered in #996, though that particular issue lacks sufficient detail to reproduce / know.

What I'm looking for: canonical advice on threading and JPype. Is it really the case that every use of JPype should be tailed by a detach in order to ensure that the process ends cleanly? If not, I would be happy to enable a tracing build to track down what it is that is keeping the process alive.

Thrameos commented 9 months ago

There is some historic context needed here….

At one point attach and detach was required for all threads. In JNI a thread has to be attached or it will crash. Once it was attached Java has memory allocated and it adds it to the list of threads that will prevent shutdown. If the thread ends and it wasn’t detached, then it becomes a memory leak. The prevent shutdown can be changed by attaching as daemon.

The problem with having the user attach and detach is that exterior tools (IDEs etc) don’t know about this rule. So they would call Java without being attached and it would give random crashes.

To fix this problem, threads now automatically attach whenever they try to use a Java resource. This creates a new problem which this is now that the user can’t see the attach they don’t know when to detach, and threads that the user is unaware of that their resource is leaking.

The correct answer for detach is that ever time an external thread is terminating it should detach to free the resources (making sure never to call Java resources again). I would like to monkey patch Python threads so that if JPype is installed it would call the correct detach at the right point which solves the problem or create the equivalent of an “atexit” for threads in C such that even if Python skips it we are guaranteed to call properly. Unfortunately, there doesn’t seem to be a good place to attach the logic in Python, and the C API for threads is horribly OS driven and there is no universal way place it that will actually work.

So on to your question….

You likely are mistaking the function of detach and attach as daemon. If you are making lots and lots of threads and you are experiencing a memory leak, then placing detach at the end of each thread in the equivalent of the finally statement solves the problem. If you are creating threads and Java is not shutting down, then you need to call attach as daemon when the thread is first created. In general, if you create a few threads that live for the life of the program, you never need to call detach. The Java or the Python exiting will trigger detachment automatically, but this depends on explicitly calling Java shutdown as Java only recognizes it is time to exit by explicit call OR the last non-daemon thread terminating.

In the unlikely event you are experiencing both problems then you would need to have the attach as daemon at the front, and detach in the finally statement (such that it can’t be bypassed even with an exception.)

I had some code that allows you to probe what is the state of all Java threads (including seeing what is attached and how.)

pelson commented 9 months ago

Thanks for all this context - really valuable!

If the thread ends and it wasn’t detached, then it becomes a memory leak. The prevent shutdown can be changed by attaching as daemon.

If you are creating threads and Java is not shutting down, then you need to call attach as daemon when the thread is first created.

Upon re-reading the docs (source):

Rather that crashing randomly, JPype automatically attachs[sic] any thread that invokes a Java method. These threads are attached automatically as daemon threads so that will not prevent the JVM from shutting down properly upon request.

I can see also that the docstring for Thread.attachAsDaemon() is consistent here:

JPype automatically attaches any threads that call Java resources as daemon threads.

I am therefore quite confused...

import multiprocessing.pool
import threading

def main():
    import jpype as jp
    if not jp.isJVMStarted():
        jp.startJVM()
        jp.java.lang.Thread.detach()
        jp.java.lang.Thread.attachAsDaemon()

t = multiprocessing.pool.ThreadPool(1)
f1 = t.apply(main)
f2 = t.apply(main)

t.close()
t.join()

Now does the right thing (seemingly), and exits nicely. This is consistent with your advice :+1:.

However, if I'm detaching, then proceed to attach as daemon, why does this make a difference at all if the docs are correct and we were already attached as daemon? Are the docs wrong on this?

Thrameos commented 9 months ago

I think the port of confusion is the any thread “other than main” is attached as daemon automatically. Meaning that if you create a random thread and then it calls Java without having been attached, then it will automatically attachAsDaemon.

The thread used to launch the JVM is consider the “main” thread for Java. The main thread is attached by the JVM, which is not attachedAsDaemon. It is also special for other reasons being the only thread that can call shutdown, and must be the last thread to die. I don’t see an issue with detaching and attachingAsDaemon to the main thread, though there may some shutdown implications I haven’t considered.

The “main” thread issue is I believe the reason the Java GUI hangs on OSX as there is also a special thread for GUI actions on OSX. When Java main and OSX GUI are the same thread then there is a deadlock, which is why changing Python main to another thread after starting the JVM works. I am eager to hear if we can switch this around and get the JVM on a new thread and fix that problem for good.

So the doc is correct, but it is leaving out the context Java did the attachment at startJVM as something other than daemon.

pelson commented 9 months ago

I think the port of confusion is the any thread “other than main” is attached as daemon automatically

In the example above, I was able to see a different behaviour on non-main thread depending on whether I run detach() followed by attachAsDaemon():

import time
import threading

def main():
    import jpype as jp
    jp.startJVM()
    time.sleep(1)
    print('started')

t = threading.Thread(target=main, name='jvm-starter')
t.start()
t.join()
print('done')

That doesn't exit cleanly, whereas the following does:

import time
import threading

def main():
    import jpype as jp
    jp.startJVM()
    jp.java.lang.Thread.detach()
    jp.java.lang.Thread.attachAsDaemon()
    time.sleep(1)
    print('started')

t = threading.Thread(target=main, name='jvm-starter')
t.start()
t.join()
print('done')

Using all that you've told me, this strongly suggests that the non-main thread is being attached as a user/non-daemon thread. (I checked whether making the Python thread daemon or not makes a difference, and it doesn't). In contrast, there appears to be no impact of attach / attachAsDaemon on the main thread (it always exits "cleanly").

Just in case it matters, this is openjdk version "11.0.13" 2021-10-19.

pelson commented 9 months ago

I was just writing some tests for this, and was doing so via a subprocess. Turns out that the behaviour changes if it is forked vs spawned:

def main():
    import threading

    def main():
        import jpype as jp
        jp.startJVM()
        print('started JVM')

    t = threading.Thread(target=main, name='jvm-starter')
    t.start()
    print('finished')

if __name__ == '__main__':
    # main()
    from multiprocessing import Process, set_start_method
    set_start_method('fork')
    p = Process(target=main, )
    p.start()
    p.join()

spawn blocks, whereas fork doesn't.

I don't think this is particularly important, but it is interesting (and I couldn't justify the behaviour to myself).

I note that the method by default on OSX is spawn since Python 3.8, just in case there is an implication for your commentary above.

Thrameos commented 9 months ago

I believe it was mentiined somewhere in the docs that the JVM does not handle fork well.

pelson commented 9 months ago

I believe it was mentiined somewhere in the docs that the JVM does not handle fork well.

Yes, at https://github.com/jpype-project/jpype/blob/b603ff6ca6d58927262a202de83583a282924f2c/doc/userguide.rst#L2735 ("JPype cannot be used with processes created with fork"), but it actually works in some context as discussed in https://github.com/jpype-project/jpype/issues/1024.

In the context of this discussion though, the only difference between spawn vs fork is that one results in a non-daemon JVM, whereas the other exits as expected. It was just an observation (mostly in case it results in a clearer understanding of what is going on).

Using all that you've told me, this strongly suggests that the non-main thread is being attached as a user/non-daemon thread.

It is this point that I would value your input on - this is not expected based on what you've said. Is this a bug in JPype, or is it a detail of JNI? Would there be any reason not to attach as daemon when starting the JVM (as seems to be the case on the main thread)?

Thrameos commented 9 months ago

These are by design parts of JNI. Java was designed such that is free to start shutdown once the last user thread is closed. Starting the shutdown has serios implications as to what actions the JVM can take. So by design the JVM forces an attach to the thread that starts it. Which in the case of a fork when gets yanked from it leads to unexpected behavior.

It certainly would not be a good idea for JPype to by default attach the main thread as deamon as this is really undefined behavior. Unless there is some documentation stating this is a supported action, you are at the mercy of the JVM implementation and we cant possibly test them all. Though I doubt any of them will spawn a copy of nethack, we cant guarantee it.

Because it is by design part of the JVM the best we can do is make some other thread the main thread for Java and have an atexit call on the real main send a signal to let the java main proceed to shutdown.

Does that help?

Thrameos commented 9 months ago

Btw... this isn't the first time the by design parts of JNI/JVM have been a problem. By design the launching thread has no context in the Java callback system, thus no module id. When they made it a requirement that all priviliged operations check the modules caller id, it literally broke JPype and every other code that operates a slaved jvm. This still isn't fixed so we reroute our calls through a redirector in org.jpype jar. They could allow the main thread to declare it module and attachment type, but they haven't put much thought into JNI since JVM 1.5.

Thus the state of JNI being what it is is likely just an oversight as to how threading and the main thread operates with the JVM being a slave. They haven't focused on it, don't test it, and barely support it as the Java launched from shell is their main usage.

Thrameos commented 9 months ago

I looked in to this further. The JVM requires the main thread be user not daemon. Their intent is for the JVM to be launched on some thread, spawn one or more user threads, then call DestroyJVM on that original thread. We can do the same thing by spawning on a side thread, attaching on the Python main, then calling DestroyJVM, on the side.

It wpuld seem like calling DestroyJVM is dangerous, but in fact reading the docs makes it clear the DestroyJVM is actually a wait statement.

I made an attempt to make a binary jpython to be shipped out with JPype which will perform the proper sequence plus the visual thread for mac. Unfortunately I ran into bootloader problems with the module security in newer Java that will be difficult to overcome. The only sucess I had was making a receipe for making executables via setuptools. I will give it another go the next time slot that opens up.