jpype-project / jpype

JPype is cross language bridge to allow Python programs full access to Java class libraries.
http://www.jpype.org
Apache License 2.0
1.11k stars 179 forks source link

Upgrading from 0.7.5 - cannot load classes from jar #871

Closed tevansuk closed 3 years ago

tevansuk commented 4 years ago

Hi all, I'm trying to update from a prototype I wrote using JPype 0.7.5 to the latest 1.1.1 version, but it can no longer load classes from the JAR files:

docker run -it \
    -v /home/tevans/devel/jpype-demo/in:/in \
    -v /home/tevans/devel/jpype-demo/out:/out \
    -v /home/tevans/devel/jpype-demo:/app \
    -m 256m \
    jpype-demo:latest
INFO:root:JVM started: True
INFO:root:classpath: /app/lib/aspose-cells-2.4.1.jar:/app/lib/dom4j-1.6.1.jar
Traceback (most recent call last):
  File "org.jpype.JPypeContext.java", line -1, in org.jpype.JPypeContext.callMethod
  File "Method.java", line 566, in java.lang.reflect.Method.invoke
  File "DelegatingMethodAccessorImpl.java", line 43, in jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke
  File "NativeMethodAccessorImpl.java", line 62, in jdk.internal.reflect.NativeMethodAccessorImpl.invoke
  File "NativeMethodAccessorImpl.java", line -2, in jdk.internal.reflect.NativeMethodAccessorImpl.invoke0
  File "Class.java", line 315, in java.lang.Class.forName
  File "Class.java", line -2, in java.lang.Class.forName0
  File "ClassLoader.java", line 522, in java.lang.ClassLoader.loadClass
  File "ClassLoader.java", line 589, in java.lang.ClassLoader.loadClass
  File "URLClassLoader.java", line 471, in java.net.URLClassLoader.findClass
Exception: Java Exception

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/venv/lib/python3.7/site-packages/jpype/imports.py", line 200, in find_spec
    cls = _jpype.JClass("java.lang.Class").forName(name)
java.lang.ClassNotFoundException: java.lang.ClassNotFoundException: com.aspose

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/app/main.py", line 38, in <module>
    main()
  File "/app/main.py", line 26, in main
    import com.aspose
  File "/venv/lib/python3.7/site-packages/jpype/imports.py", line 204, in find_spec
    raise ImportError("Failed to import '%s'" % name) from ex
ImportError: Failed to import 'com.aspose'
Makefile:23: recipe for target 'run' failed

/app/main.py looks like:

import logging
from glob import glob
from os.path import join
from importlib import util as _util  # noqa

from app.conf import settings

import jpype  # noqa isort:skip
import jpype.imports  # noqa isort:skip

jpype.startJVM(convertStrings=True, classpath=[join(settings.CLASSPATH, "*")])

import java.lang.System  # noqa isort:skip

def main():
    logging.basicConfig(level=logging.INFO)
    logging.debug(f"in: {settings.IN}  out: {settings.OUT}")
    logging.info(f"JVM started: {jpype.isJVMStarted()}")
    logging.info(f"classpath: {java.lang.System.getProperty('java.class.path')}")
    import com.aspose
    return

if __name__ == "__main__":
    main()

The docker environment is openjdk:11-slim-buster with python3.7 python3.7-venv python3-pip make also installed.

Any pointers / debug steps would be greatly appreciated!

Thrameos commented 4 years ago

I would start by adding

import java
print(java.lang.System.getProperty('java.class.path'))

The import location for the failure is during the recovery processes. The procedure in JPype 1.x series is the following.

1) Consult the package manager to see if the imported item is a package. 2) If not in the list of accepted packages, then attempt to look up a class using JClass. 3) If not found as a class try to get an error for the user by calling Class.forName so we can see what Java exception is generated.

As this doesn't look like a class name it must have failed the first check. So we need to make sure the jar is in the path, make sure the jar is loadable and contains the package that you are trying it import. Last we can manually load up the PackageManager and probe it to see why it thinks the com does not contain the package you want. But as this is a multstep process lets start simple and make sure the assumptions are correct before diving into the deep end.

tevansuk commented 4 years ago

Yep, this was in the original output:

INFO:root:classpath: /app/lib/aspose-cells-2.4.1.jar:/app/lib/dom4j-1.6.1.jar

I changed the import to import an actual class rather than a package (from com.aspose.cells import Workbook), same effect.

The class is in the JAR:

> $ unzip -l lib/aspose-cells-2.4.1.jar | grep Workbook
    33747  2010-09-21 18:59   com/aspose/cells/Workbook.class
    15650  2010-09-21 18:59   com/aspose/cells/WorkbookDesigner.class
     3039  2010-09-21 18:59   com/aspose/cells/WorkbookRender.class

(yes, this is a seriously old library/JAR file! It does import correctly + work with openjdk 11/JPype 0.7.5)

tevansuk commented 4 years ago

Trying to load the package using JPackage:

>>> from importlib import util as _util
>>> import jpype
>>> import jpype.imports
>>> jpype.startJVM(convertStrings=True, classpath=["/app/lib/*"])
>>> import java
>>> java.lang.System.getProperty('java.class.path')
'/app/lib/aspose-cells-2.4.1.jar:/app/lib/dom4j-1.6.1.jar'
>>> from jpype import JPackage
>>> JPackage("com.aspose.cells")
<java package 'com.aspose.cells'>
>>> cells = JPackage("com.aspose.cells")                                                                  
>>> cells.Workbook
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AttributeError: Java package 'com.aspose.cells' is not valid
>>>
Thrameos commented 4 years ago

Okay this is definitely an issue with package manager failing to find directory entries. My best guess is the jar file was stripped of all the directory entries so that when I try to load the jar for inspection it is not finding the required directory pointers. Would you mind using jar -tvf to inspect and see if it contains both the directories and classes or just the classes?

Second we can use the JarFileSystem to open the jar manually and try to load a directory to see if entry is found. The old JPype didn't do any sort of checking of the package names so it would blindly accept jar files that didn't contain any indexing. In my testing I was unable to find a why to make a Jar which lacked that info but you may have found the edge case.

Thrameos commented 4 years ago

The gut of the system is in org.jpype.pkg.JPypePackageManager. You can open the class and use the methods to see what it is seeing or manually create some of the methods in python to see why something may be failing. I can probe it after hours, but unfortunately without a jar file to test on I am pretty limited.

tevansuk commented 4 years ago

The JAR file seems to have the right structure (see attached, its very long!)

jar-tvf-aspose-cells-2.4.1.jar.txt

The code I'm porting is an Aspose library for generating Office docs - this code is written for an ancient version of the library which is no longer in their maven repository (licenses are quite pricey!), but we also have a license for a newer version of their libraries aspose-cells-19.8. I believe - IANAL - its fine to use their libraries to trial things - you need a license file to produce anything without a watermark. This version also behaves in the same way.

They've also started publishing a python package that uses jpype, aspose-cells. This version also seems to fail with JPype > 0.7.5 (I should have started with this route!). Let me put together a sample project that uses this.

Thrameos commented 4 years ago

Looking through the output it is clear they ran the jar through an obfuscator which did not repack the jar file in the expected form for a Jar.

Here is a portion of the listing for org.jpype.jar

  1711 Thu Oct 22 06:01:52 PDT 2020 org/jpype/JPypeSignal.class
   776 Thu Oct 22 06:01:52 PDT 2020 org/jpype/JPypeUtilities.class
     0 Mon Aug 24 13:27:22 PDT 2020 org/jpype/manager/  <======
  1127 Thu Oct 22 06:01:52 PDT 2020 org/jpype/manager/ClassDescriptor.class
  4371 Thu Oct 22 06:01:52 PDT 2020 org/jpype/manager/MethodResolution.class

Notice the every class package has a directory creation entry so that when you open the Jar as a file system you can access the contents without needing to know the full path.

Basically when they stripped the jar it deliberately removed the information for indexing the jar for directory access. Though I doubt that the authors actually intended that outcome. They were likely just trying to strip off non-public classes.

So here are the options.

1) Unpack the jar file and repack it with current java which should reconstruct the missing information. There may be licensing issues that prevent that option. 2) Use direct JClass calls to set up a module with the needed imports in Python. Not a great solution but should be workable. The hard part is making the module so that it can be loaded before the JVM is started. I usually do this by placing stubs in the module that fill out with a hook to the JVM initializer but that can result in bugs if imported early. 3) We need to add a keyword argument to JPackage so that you can create an unchecked package. These are dangerous because they don't prevent typos from being interpreted as packages allowing /com/mypkg/mispeled/Foo to try to work when it should have been caught. But if you are fine with an unchecked package it is workable. 4) You can make a second jar file with just a stub for the directory which imports with the existing Jar. I am not sure if that will actually fix it as the directory entry may only have the new files and not the exports from the old jar. So this would take some experimentation. 5) We can work on the pyc option which allows you to embedded Python code into a Jar which may be able to fake the missing information. This would give the most natural look but it would still require a new jar file loading with the old one.

Unfortunately the import system is supposed to catch problems. There is no directory for com.aspose in the Java system, so it can't probe it, and thus reporting it is not there is proper behavior. In JPype 0.7.5 the documents specified that importing a bare package was not supported so it was really just luck that it was working before.

Please tell me which path you think would be best to pursue. The last options would require additional code in JPype so it would work until version 1.2 is released.

tevansuk commented 4 years ago

Looks like you've identified the problems - if it helps, I put together a sample project for reproducing it: https://github.com/tevansuk/jpype-aspose-import

I just tried option 1, which initially failed with a SecurityException. I rebuilt the jar without the signatures, and then it worked just fine. I'll have to check the license to see whether that is allowed or not..

I'm not sure I fully understand 2! I'll try out option 4 to see if we can do something without repacking the JAR.

I'll also ask the vendor to have a look at their processes, after all they are now using jpype as a supported way of using their libs with python.

Thanks again for looking in to this. As a vote of encouragement, we're looking at jpype to replace an old jython app. With the jython app and a 48MB input XML document, we had to run with a 3GB JVM heap to avoid OOM errors and the process took 70 seconds. With jpype we can run in a container with only 256MB RAM and the process takes <30 seconds.

Thrameos commented 4 years ago

Option 2 is pretty simple.

jpype.startJVM(....)

Workbook = jpype.JClass("com.aspose.cells.Workbook")
...

Just manually create the class wrapper for all the classes that you require. You can put it in a module but it can't be imported until after the JVM is started.

Thrameos commented 4 years ago

To be fair on Jython, the JVM is really not well set up for Python object model. It has no way to recycle objects quickly resulting in a huge memory bloat and a speed penalty (classic "object creation bottleneck"). I wrote a code that was doing a huge number of quaternion rotations in Java and it was painfully slow and memory intensive. Simply manually reusing the objects was a massive speed up but the code clarity was horrible. If a language forces you to write bad code to work around its limitations it is not well thought through. If Jython had been willing to push a good fraction into C using JNI they likely could get pretty decent speed, though the object cost would likely still be double as you can't manually allocate memory shared in the JVM with outside other than things like unsafe buffers. I am really hoping new virtual machines like Graalvm will break some of these limitations so that you can have the option of local or short term recycling of objects which would make vm implementations of Python viable.

I always hate when I have to give contrast with Jython and have to give a horrible review on speed and bloat. The concept of pure JVM implementation and the benefits of embedding should make Jython a great tool. Porting Python is no small task and nitpicking something with was really out of their control feels bad. But the speed hit and lack of interfacing to existing CPython modules really cuts deep.

That said it is nice to hear that JPype is actually making a difference. It was pretty painful to push it to what I considered production quality, but it is starting to pay dividends.