jpype-project / jpype

JPype is cross language bridge to allow Python programs full access to Java class libraries.
http://www.jpype.org
Apache License 2.0
1.12k stars 183 forks source link

Starting JVM with jpype severely slows down pure python code in the same process #1197

Closed cvijdea-bd closed 5 days ago

cvijdea-bd commented 5 months ago

After a call to jpype.startJVM(), all python code is slowed down by 2x even when there are absolutely no other interactions with java/jpype. This is further aggravated the higher one sets heap size of the started JVM.

The problem seems to be the mechanism by which every python GC triggers a java GC. Adding this line: jpype._jpype._collect = lambda *args, **kwargs: None before startJVM makes the issue go away.

Reproduction script ```py import argparse import random import time def generate_garbage(): garbage = {} for _ in range(100_000): e = {} for _ in range(10): e[random.randbytes(4)] = [random.randint(10_000, 50_000)] garbage[random.randbytes(10)] = e return garbage def main(): parser = argparse.ArgumentParser() parser.add_argument('--disable-jpype-collect', action='store_true', default=False) parser.add_argument('--start-jvm', action='store_true', default=False) args = parser.parse_args() import jpype if args.disable_jpype_collect: jpype._jpype._collect = lambda *args, **kwargs: None if not args.start_jvm: parser.error('--disable-collect requires --jvm') if args.start_jvm: jpype.startJVM("-Xmx8g", "-Xms8g") start = time.perf_counter() for _ in range(3): generate_garbage() print(f'took {time.perf_counter() - start:.2f} seconds to generate garbage') if __name__ == '__main__': main() ```
$ python test_jpype_slow.py 
took 2.78 seconds to generate garbage

$ python test_jpype_slow.py --start-jvm
took 5.02 seconds to generate garbage

$ python test_jpype_slow.py --start-jvm --disable-jpype-collect
took 2.74 seconds to generate garbage

Reproduced with jpype 1.5.0, openjdk 22, python 3.12 on Windows 11

Thrameos commented 5 months ago

I believe the issue was with the counter for triggering the gc. For some systems, the units are wrong resulting in many gc triggers (bytes vs kbytes). I believe linux was affected by not windows. Unfortunately, I never was able to isolate it as on the system that I tested the java GC triggered every 10 to 1000 Python GC cycles properly. I will have to do another diagnostics to see what is going on.

As Python doesn’t have a good method of measuring how it is using memory so we can only do before and after gc cycle checks, we are using a high water mark in which GCs get triggered frequently at first but as more statistics are built up it gets a better idea of how much the memory is changing with each type of GC cycle. It is possible that your example is just triggering the start up phase while it is still figuring out the high-water mark. But that won’t happen until you have reached a certain level which is where the unit problem may be coming in. As neither Java nor Python have a cooperative GC and Java is a “STOP-THE-WORLD” style it is hard to every find a balance that won’t cause at least some efficiency loss.

cvijdea-bd commented 5 months ago

Maybe a good solution would be to provide a parameter that disables the gc integration both ways.

I don't think most programs are affected by the problem the reciprocal GC triggers are trying to solve (java object holding python objects alive or the other way around leading to memory exhaustion)

cvijdea-bd commented 5 months ago

However I can confirm that the issue does not reproduce on Linux with jpype 1.5.0, openjdk 21, python 3.10. The time to run the script is the same with or without the startJVM call.

Thrameos commented 5 months ago

I am loath to completely remove reciprocal GC. It is one of those things you don't need until suddenly you do (like seat belts). If everything was great for small projects and short runs, but the wheels came off as soon as you do something substantial then it would be a serious issue.

That said there should be throttle on it such that it is not running excessively. If if is a counter of how many Java objects have been created to Python and the emergency reciprocal GC in which if the memory doesn't come down on one GC the other one gets triggered.

It would help hugely if Python actually had bookkeeping on the total memory in objects created and freed. It is the delta between those that I need. While I have accounting for Java memory usage though its API, the Python one without "trace" enabled is very anemic and thus there is no way to tell the difference between total process memory (which includes all the garbage Java pages that it hangs on to no matter what you do) and the actual pool of used objects.

Thanks for getting back regarding the windows vs linux. I can focus my efforts better now that I can see that logic on one is working and the other is failing.

Thrameos commented 6 days ago

I have placed a patch for this issue. Unfortunately I can't stop all overhead for calling the gc. It has to be triggered at least once so that it can compute the total objects that are interacting. Your example was generating garbage faster than the java GC was executing which meant it triggers the gc many times in a row which was causing the slow down. I have patched it by raising the limit under such a case. The high water will come down (or the garbage will pile up) which should trigger the Java GC. The current statistics I get are 2.37 s without the GC and 2.59 with the GC so a 10% overhead.