ninia / jep

Embed Python in Java
Other
1.3k stars 147 forks source link

How can I use Java reflection? #405

Open ctrueden opened 2 years ago

ctrueden commented 2 years ago

With jep, Java classes are represented with Python objects of type PyJClass. But that class is missing many of the methods of java.lang.Class: getMethods, getFields, getName, getSimpleName, etc. Is there a way to execute these methods on a PyJClass object?

bsteffensmeier commented 2 years ago

Currently there is no way to access the methods on java.lang.Class. You can get around this by writing a class in Java which has utility methods that execute the corresponding methods for you and return the results. From python you can pass a PyJClass to these Utility methods. For example the java class could look something like this:


public class ReflectionHelper {
    public static Method[] getMethods(Class<?> clazz) {
        return clazz.getMethods()
    }
}
ctrueden commented 2 years ago

Thanks @bsteffensmeier. I started looking into what it would take to make PyJClass have these functions. I tried simple hacks like attaching attributes to instances of PyJClass, but it seems it cannot be done:

>>> c
<jep.PyJClass object at 0x7ff73d1752b0>
>>> str(c)
'class java.lang.Object'
>>> c.getName()
'jep.PyJClass' object has no attribute 'getName'
>>> setattr(c, 'getName', lambda: 'test')
'java.lang.Object' object has no attribute 'getName'.

which I guess isn't surprising.

I would like to contribute to jep here, so I started digging into the C code to see if changing e.g. pyjclass.c could accomplish it, but I'm rather ignorant of JNI internals as well as the C-facing aspects of CPython, so I'm out of my depth. Are there resources you would recommend to help me get up to speed on these technologies?

bsteffensmeier commented 2 years ago

For general reference I cannot do much better than the JNI Developer Guide, the online docs for embedding Python, and Python/C API Reference Manual. In general the JNI concepts are easier to learn but the Python C-Api is more powerful and we use it more extensively, I also frequently find myself in the CPython source code to see how they implement things or what is expected when the other docs are ambiguous.

You've jumped in to one of the more complicated parts of Jep. In my opinion the reason PyJClass doesn't expose the Java reflections API's is because it has been overloaded to do other things and trying to do more would be a significant refactor, could introduce conflicts, and may require breaking backwards compatibility. The things I can think of that PyJClass does are

  1. Provide access to constructors
    >>> from java.util import Map, HashMap
    >>> HashMap()
    <HashMap object at 0x7f9d30241bd0>
  2. Provide access to static members
    >>> Map.of("key1", 1, "key2", 2)
    <ImmutableCollections$MapN object at 0x7f9d30241d60>
  3. Provide access to inner classes
    >>> Map.Entry
    <jep.PyJClass object at 0x7f9d31f46530>
  4. Provide some pythonic reflective capabilities.
    >>> dir(Map)
    ['Entry', '__call__', '__class__', '__delattr__', '__dict__', '__dir__', '__doc__', '__eq__', '__format__', '__ge__', '__getattribute__', '__gt__', '__hash__', '__init__', '__init_subclass__', '__le__', '__lt__', '__ne__', '__new__', '__pytype__', '__reduce__', '__reduce_ex__', '__repr__', '__setattr__', '__sizeof__', '__str__', '__subclasshook__', 'clear', 'compute', 'computeIfAbsent', 'computeIfPresent', 'containsKey', 'containsValue', 'copyOf', 'entry', 'entrySet', 'equals', 'forEach', 'get', 'getOrDefault', 'hashCode', 'isEmpty', 'java_name', 'keySet', 'merge', 'of', 'ofEntries', 'put', 'putAll', 'putIfAbsent', 'remove', 'replace', 'replaceAll', 'size', 'synchronized', 'values']

One interesting side affect of 4 is that you can actually find all the reflective methods on the PyJClass for java.lang.Class but as far as I can tell there is no way Jep will allow you to call any of them.

>>> from java.lang import Class
>>> dir(Class)
['__call__', '__class__', '__delattr__', '__dict__', '__dir__', '__doc__', '__eq__', '__format__', '__ge__', '__getattribute__', '__gt__', '__hash__', '__init__', '__init_subclass__', '__le__', '__lt__', '__ne__', '__new__', '__pytype__', '__reduce__', '__reduce_ex__', '__repr__', '__setattr__', '__sizeof__', '__str__', '__subclasshook__', 'asSubclass', 'cast', 'desiredAssertionStatus', 'equals', 'forName', 'getAnnotatedInterfaces', 'getAnnotatedSuperclass', 'getAnnotation', 'getAnnotations', 'getAnnotationsByType', 'getCanonicalName', 'getClass', 'getClassLoader', 'getClasses', 'getComponentType', 'getConstructor', 'getConstructors', 'getDeclaredAnnotation', 'getDeclaredAnnotations', 'getDeclaredAnnotationsByType', 'getDeclaredClasses', 'getDeclaredConstructor', 'getDeclaredConstructors', 'getDeclaredField', 'getDeclaredFields', 'getDeclaredMethod', 'getDeclaredMethods', 'getDeclaringClass', 'getEnclosingClass', 'getEnclosingConstructor', 'getEnclosingMethod', 'getEnumConstants', 'getField', 'getFields', 'getGenericInterfaces', 'getGenericSuperclass', 'getInterfaces', 'getMethod', 'getMethods', 'getModifiers', 'getModule', 'getName', 'getNestHost', 'getNestMembers', 'getPackage', 'getPackageName', 'getProtectionDomain', 'getResource', 'getResourceAsStream', 'getSigners', 'getSimpleName', 'getSuperclass', 'getTypeName', 'getTypeParameters', 'hashCode', 'isAnnotation', 'isAnnotationPresent', 'isAnonymousClass', 'isArray', 'isAssignableFrom', 'isEnum', 'isInstance', 'isInterface', 'isLocalClass', 'isMemberClass', 'isNestmateOf', 'isPrimitive', 'isSynthetic', 'java_name', 'newInstance', 'notify', 'notifyAll', 'synchronized', 'toGenericString', 'toString', 'wait']
>>> Class.getFields()
Instantiate this class before calling an object method.

If you are interested in learning jep internals then something that you might find interesting is that it is actually very easy to create a python object wrapping java.lang.Class that has all the reflective capabilities you are looking for. All of the code we normally use for creating PyJObjects works perfectly fine for java.lang.Class but we skip that code and make a PyJClass instead. This happens in several places, if you look through the jep code for PyJClass_Wrap you will find everywhere that we make a PyJClass instead of a regular PyJObject.

I wanted to experiment with this to make sure it worked how I remember so I commented out the code in convert_j2p.c that calls PyjClass_Wrap. That code is used whenever jep has a java object of an unspecified type and needs a python object, this happens in Interpreter.set or whenever a Java method is called from python that returns an Object. It is not called when using an import or when calling a java method that returns a java.lang.Class. After commenting out the call to PyJClass_Wrap it will now fall through and turn java classes into PyJObjects. After making that change in my build I started an interactive jep interpreter. As I mentioned import statements don't go through that code so things start pretty normal

>>> from java.util import ArrayList
>>> type(ArrayList)
<class 'jep.PyJClass'>
>>> ArrayList.getDeclaredFields()
'jep.PyJClass' object has no attribute 'getDeclaredFields'

My favorite way to trigger that conversion code is just to add an object to an ArrayList and then get it out. Since ArrayList.get() returns an Object it always has to go through our generic conversion code which I modified

>>> a = ArrayList()
>>> a.add(ArrayList)
True
>>> ForbiddenArrayList = a.get(0)

Now I have access to all the reflection!

>>> type(ForbiddenArrayList)
<class 'java.lang.Class'>
>>> ForbiddenArrayList.getDeclaredFields
<bound method getDeclaredFields of <java.lang.Class object at 0x7f4007bde8c0>>
>>> ForbiddenArrayList.getDeclaredFields()
<jep.PyJArray object at 0x7f4007bd95d0>
>>> [f.getName() for f in ForbiddenArrayList.getDeclaredFields()]
['serialVersionUID', 'DEFAULT_CAPACITY', 'EMPTY_ELEMENTDATA', 'DEFAULTCAPACITY_EMPTY_ELEMENTDATA', 'elementData', 'size', 'MAX_ARRAY_SIZE']
>>> ForbiddenArrayList.getSuperclass()
<jep.PyJClass object at 0x7f4007be3370>

The call to getSuperclass() is not ideal, because it returns a PyJClass. If I wanted to take this experiment further I could try to get rid of the PyJClass_Wrap in pyjmethod.c so that methods return PyJObject instead of PyJClass. You could also just play the same conversion trick and drop it in an ArrayList to get a PyJObject out.

The downside here is that the a PyJObject for a java.lang.Class cannot do any of the special things PyJClass does

>>> ForbiddenArrayList()
'Class' object is not callable
>>> dir(ForbiddenArrayList)
['__class__', '__delattr__', '__dir__', '__doc__', '__eq__', '__format__', '__ge__', '__getattribute__', '__gt__', '__hash__', '__init__', '__init_subclass__', '__le__', '__lt__', '__module__', '__ne__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__setattr__', '__sizeof__', '__slots__', '__str__', '__subclasshook__', 'asSubclass', 'cast', 'desiredAssertionStatus', 'equals', 'forName', 'getAnnotatedInterfaces', 'getAnnotatedSuperclass', 'getAnnotation', 'getAnnotations', 'getAnnotationsByType', 'getCanonicalName', 'getClass', 'getClassLoader', 'getClasses', 'getComponentType', 'getConstructor', 'getConstructors', 'getDeclaredAnnotation', 'getDeclaredAnnotations', 'getDeclaredAnnotationsByType', 'getDeclaredClasses', 'getDeclaredConstructor', 'getDeclaredConstructors', 'getDeclaredField', 'getDeclaredFields', 'getDeclaredMethod', 'getDeclaredMethods', 'getDeclaringClass', 'getEnclosingClass', 'getEnclosingConstructor', 'getEnclosingMethod', 'getEnumConstants', 'getField', 'getFields', 'getGenericInterfaces', 'getGenericSuperclass', 'getInterfaces', 'getMethod', 'getMethods', 'getModifiers', 'getModule', 'getName', 'getNestHost', 'getNestMembers', 'getPackage', 'getPackageName', 'getProtectionDomain', 'getResource', 'getResourceAsStream', 'getSigners', 'getSimpleName', 'getSuperclass', 'getTypeName', 'getTypeParameters', 'hashCode', 'isAnnotation', 'isAnnotationPresent', 'isAnonymousClass', 'isArray', 'isAssignableFrom', 'isEnum', 'isInstance', 'isInterface', 'isLocalClass', 'isMemberClass', 'isNestmateOf', 'isPrimitive', 'isSynthetic', 'java_name', 'newInstance', 'notify', 'notifyAll', 'synchronized', 'toGenericString', 'toString', 'wait']
>>> ForbiddenArrayList.addAll
'Class' object has no attribute 'addAll'
>>> a.add(Map)
True
>>> ForbiddenMap = a.get(1)
>>> ForbiddenMap.of("key1", 1, "key2", 2)
'Class' object has no attribute 'of'
>>> ForbiddenMap.Entry
'Class' object has no attribute 'Entry'

So we are actually very close having the capability you are looking for but enabling it breaks other things and that is where it starts getting messy.

ctrueden commented 1 year ago

Belated thanks @bsteffensmeier for the detailed explanation. I came back to this issue today while I was cleaning up a work-in-progress branch adding jep support to the scyjava library. I just wanted to mention that the way JPype handles the overloading-too-much-functionality issue is that like Java, it distinguishes between a class definition and a class instance (ref). The class definition is the class name without .class suffix, like in Java; e.g. the String part of String.format(...). This corresponds to what Jep has now. The other thing, the class instance, is accessed by writing String.class_ in JPype, which is familiar to Java programmers used to writing String.class. And like Jep, if you pass the class definition to a Java function wanting a class instance, it still magically works. If there were a way to make Jep's PyJClass objects have a class_ property that returned the Class instance, that would be amazing, and might pose fewer/no problems? It's still in my backlog to dig into the Jep code to figure out how to do this, based on your pointers above.