ninia / jep

Embed Python in Java
Other
1.33k stars 150 forks source link

Method interception when calling into java / kwargs support #313

Open jsnps opened 3 years ago

jsnps commented 3 years ago

Hey guys,

we want to migrate our Python API in our Eclipse tool from Python 2 to Python 3. So far we have been using Jython, but as Jython is stuck at Python 2, we plan to integrate a C Python interpreter. I'm missing one feature in Jep, which is essential for us to be mostly compatible with existing scripts: Keyword arguments.

In Jython this was solved by a "special" overloaded method signature and some arg parser implementation on the java side:

ReturnType myMethod (PyObject[] args, String[] keywords) {
ArgParser ap = new ArgParser(args, keywords, new String[] { "path" });
String path = ap.getString(0);
return myMethod(path);
}

In Jython it was also relatively easy to intercept the python method call on the java side and e.g. synchronize that call onto another thread. We did this to ensure that our data model is always only changed from one thread. The nice thing was, that we didn't have to implement this in every single API, but for certain types could simply dispatch all calls onto the data thread (and have some annotation to disable it, if it was not desired).

I'm wondering if Jep could have a simple hook to allow for both. I browsed through the code and my current understanding is:

One rough proposal for keyword arg support would be to hook into this current concept and detect a "special" method signature. If such a method is found, route all method calls through this method with the original python arguments and give java control over the dispatching.

public interface JepInterceptor {
    Object jep_invoke(String methodName, Object[] args, String[] keywords); // or PyObject[]
}

public class JavaType implements JepInterceptor {
    Object jep_invoke(String methodName, Object[] args, String[] keywords) {
        //TODO arg parsing
        //TODO invoke method
    }
}

What do you think about this idea? Are there maybe already existing hooks, which we could make use of to achieve the above? Or do you have some alternative idea to allow for method interception or kwargs support?

Thanks!

bsteffensmeier commented 3 years ago

I can see some potential for adding support for keyword arguments from python into java but I am not so sure about the need for a method interceptor.

I'm not sure I understand the specifics of your example for keyword arguments. Are the args and keywords arrays always the same size so they are associated by position? I am more familiar with the cpython call protocol which passes a tuple for position arguments and a dict for kwargs. We mirrored that in java for PyCallable. If we want to expose the kwargs in java I would expect to support method signatures like PyCallable with an array and a map or just a Map. I haven't thought about it much but my initial thought is it would be cool to have an annotation on methods with the right signature that would indicate pyjmethods should pass the kwargs on directly instead of failing.

For intercepting methods, I think there are already frameworks for java like InvocationHandler, Spring, and CGLib that provide some ways to do method interception and I would think these would be superior to anything jep could provide. For your example of intercepting calls to synchronize threads wouldn't you also need the same type of synchronization if you were modifying your data object form multiple java threads? This seems like a more general problem and not something that is specifically related to using jep.

jsnps commented 3 years ago

Hey, this signature comes from Jython I guess, and it has all args in the first array and all keywords in the second. They don't need to be of the same size, the number of keywords can be less than args and it is interpreted as n positional arguments + m keyword arguments, where the size of the kw array then is also m. Basically overlay the keywords over the args, but move it to the end. E.g:

call(arg1, arg2, kw1=arg3, kw2=arg4) => args = {arg1, arg2, arg3, arg4} kw = {kw1, kw2}

It utilizes the restriction that positional args are not allowed after a keyword arg. The "protocol" is then implemented in the ArgParser class. Not sure if this has been done for any performance reasons, but I would also be fine with the cpython protocol (list or array of args + map of args on the java side).

The annotation idea we also already thought of, because right now for any API method we have another default method on the interface writing three lines of code for the arg parsing (lots of tedious work). Annotations can automate this. But we also have some "smart" APIs, checking on the type and transforming on the fly (e.g. pass some object or a hierarchical path to retrieve the object). For these cases a pure annotation based approach would not work anymore. :(

For the intercepting hook, I agree the synchronization is not something which needs to be solved by JEP, but some intercepting would allow to do so easily. But indeed, this we could possibly also move onto another layer, working with e.g. java proxy objects. For my concrete example, you are right, this synchronization "should" also happen when working on other threads. In our case though, the model used to be only modified from the ui thread through ui actions, the model itself is not thread safe. The python API was added later and is used on a separate Python thread - so there was no need to have the synchronization on another level but the Python level. The cheapest back then was to simply execute the individual API calls on the main thread and this we have done through method interception in a very central place. We didn't want to execute the whole script on the main thread, because it might be long running and block the ui. Of course this is a problem which could also be solved independently of JEP.

There is another reason (at least for us ;)), why some intercepting hook could be very powerful: We gain some control on the java side! We need to support Jython (Python 2) and Python 3 in parallel for some time. Having some more control on the java side makes it easier to exchange the technologies and decouple our classes from the underlying technology. E.g. the keyword argument support - we could have our own annotation and have a very thin layer mapping from the jep or jython concept to our annotations. Of course having this in a central place is much easier than wrapping every individual of our types for a certain technology. Maybe there is already some hacky way possible topday (which I just don't see)?? Could it be possible to exchange getattr of pyjobjects from outside of jep and wrap the callables by some callable invoking a special java method with the original callable?

bsteffensmeier commented 1 year ago

One thought I have on this is you could take advantage of the fact that python does not differentiate between methods and fields and only has attributes so you can use fields to implement your invocation handler, here is a quick test I threw together to demonstrate what I mean:

class Test {

    public static class TestCallable{
            public String __call__(String arg){
                return "Hello " + arg;
            }
    }

    public final TestCallable callableAttr = new TestCallable();

    public static void main(String[] args) throws JepException {
            SharedInterpreter.setConfig(new JepConfig());
                try (SharedInterpreter interpreter = new SharedInterpreter()) {
                   interpreter.set("object", new Test());
                   interpreter.exec("print(object.callableAttr('World'))");
                }
    }
}

Another idea is building off what you said about exchanging getattr from outside of jep. You can override __getattribute__ in your Java class definition. Here is a test of that, it is pretty rough but maybe it provides some inspiration:

class Test2 {

    public static class TestCallable{
            public String __call__(String arg){
                return "Hello " + arg;
            }
    }

    public Object __getattribute__(String name) {
       if("callableAttr".equals(name)){
               return new TestCallable();
       }
       throwAttributeError.run();
       return null;
    }

    public static Runnable throwAttributeError;

    public static void main(String[] args) throws JepException {
            SharedInterpreter.setConfig(new JepConfig());
                try (SharedInterpreter interpreter = new SharedInterpreter()) {
                    interpreter.exec("def throwAttributeError():\n    raise AttributeError()");
                    throwAttributeError = interpreter.getValue("throwAttributeError", Runnable.class);
                    interpreter.set("object", new Test2());
                    interpreter.exec("print(object.callableAttr('World'))");
                }
    }
}

In both of these examples TestCallable would be the invocation handler and could implement logic or call other methods based off whatever criteria you have. You could have different interfaces or classes for different method signatures or even overload call to make a pyjmultimethod.

Additionally TestCallable.call is just a pyjmethod in Python, so if we add the ability for pyjmethod to take kwargs with an annotation then you could automatically have invocation handlers like this that take kwargs.

While writing this code it occurs to me that we should make it so that the pyjtype for any java class that is a FunctionalInterface would automatically be callable from Python, that way TestCallable could be a Callable or Runnable or Function or any other standard or custom interface without needing to manually make a method named call. We already have the ability to convert any Python callable into any FunctionalInterface so the reverse makes sense.

Another oddity of my test code is that there is no good way to throw a Python Exception from Java. I found a workaround but that is another area of jep that needs improvement.

One drawback of using fields for invocation handlers is that you cannot have a pyjfield and a pyjmethod with the same name. I don't remember if we throw an exception or if we ignore one or the other but since python doesn't differentiate between different types of attributes we don't allow more than one.

A limitation of overriding __getattributes__ is that you would have to provide the full set of attributes, it isn't adding to the existing attributes. If you give the object access to an Interpreter there might be some way to call back into Python and call super.getattributes but it would definitely take alot of experimentation to figure that out. Another thing I would like jep to do is automatically convert java.lang.reflect.Method into pyjmethod, in which case you could return java Methods from getattributes.