jpype-project / jpype

JPype is cross language bridge to allow Python programs full access to Java class libraries.
http://www.jpype.org
Apache License 2.0
1.12k stars 181 forks source link

IPython stack traces improvement: decompiled JVM code #774

Open KOLANICH opened 4 years ago

KOLANICH commented 4 years ago

When an exception is not caught, IPython outputs stack traces with eome lines of source code for a stack frame. It can be possible to implement an own traceback with the pieces of decompiled into Java code. https://ipython.readthedocs.io/en/stable/api/generated/IPython.core.interactiveshell.html may be helpful, but ... https://github.com/ipython/ipython/issues/12378 .

For decompilation it is proposed to use either CFR (useful almost right now, I have created a wrapper, it misses some important parts to be useful IRL, mostly a standalone lib for indexing jars installed in the system and returning a path of a jar by a package name) or Krakatau (written in pure python, but 2 (I have a fork with python 3 support, the author principially refuses to port it to 3), but in order to really use it some refactoring is needed - the decompiler is a separate script. IMHO the most universal and all-eating decompiler is Krakatau (only it has managed to decompile jars compiled from Scala), but it is a bit abandonware).

https://github.com/leibnitz27/cfr/issues/188 is also related, though I am not very familiar to Java bytecode, so not sure if partial (part of a method) decompilation really possible.

Thrameos commented 4 years ago

Have you tried using the source package to get the stack trace? I know that if I have the source in path Python 3 is currently printing the C++ code. In principle if the __file__ is set properly on the package it should print the actually Java source.

I will be including ASM library in future versions of JPype. We should be able to get byte code from there. The process is a bit messy but you have to write a visitor that goes through the method byte code and translates it.

On concern that I would have with this process is that Python does not support lazy evaluation of stack traces because they use a hard coded structure. I am currently faking stacktraces for Java code, but because I can't do it in a lazy fashion that means the cost has to be taken regardless of whether the frame is ever printed or not. (It is sort of a failing in Python as they assume the frames will only come from Python source so they didn't abstract it.) If we put a big decompling step to get that info it will really throttle the number of exceptions we can handle. We are already pretty bad because we use C++ exception handling which is pretty slow in the process. I am debating taking some of the most frequent paths and used direct if statement logic to pass the exception information through rather than throwing at least for the most critical paths.

KOLANICH commented 4 years ago

but because I can't do it in a lazy fashion that means the cost has to be taken regardless of whether the frame is ever printed or not. (It is sort of a failing in Python as they assume the frames will only come from Python source so they didn't abstract it.)

If I understand right, it should be possible (or maybe was, they say the docs on that is obsolete) to configure IPython to process certain exception classes. If I remember right, our pythonic java exception classes inherit from a single class.

Thrameos commented 4 years ago

Every Java exception inherits from _jpype._JException currently. So if there is a hook to traverse and process then we should be process them. Though that would only work for IPython as the current stack trace traversal in Python itself is entirely hard coded fields (presumably for speed)

petrushy commented 4 years ago

Hi, this sounds quite interesting, just to get a printout of the source code in Java would be useful. Could I ask you to elaborate a bit on how this is used, is it enough to place the source jar in the classpath, or which path is this referring to?

Thrameos commented 4 years ago

When Python prints out the exception it has the line and file name information. If the file is found in path, it creates a cache of the file lines and then uses it to display the line number. Normally this prints out Python code, but if you are running from within the JPype code directory you will see that it actually prints out the C++ code lines when C++ exception information is turned on.

Assuming that we stuff the right __file__ information into the stack trace and tap into the code that is creating the file cache, we can have Python dump out the Java lines as well. As far as how it would work it is no different than the javadoc. You would need to supply the source code in the form of a Jar or zip to the classpath. The __file__ would need to point to something that could be recognized so that when the IPython display line module sees it we call the unpack from Jar routine to supply Python with the source code for the method. Assuming that the code is compiled with line number debugging you will get Java output. Once or twice I have accidentally triggered this behavior when I had bare class and java source at the top level directory during testing (which is where Python was searching normally).

Thrameos commented 4 years ago

Here is a demo that shows that it could work in theory.

The following code triggers the harness to print a series of exceptions.

import jpype
jpype.startJVM(classpath="test/classes")
ex=jpype.JClass('jpype.exc.ExceptionTest')()
ex.throwChain()

If I run this at the top level currently it prints...

Traceback (most recent call last):
  File "ExceptionTest.java", line 57, in jpype.exc.ExceptionTest.throwChain
  File "ExceptionTest.java", line 62, in jpype.exc.ExceptionTest.method1
  File "ExceptionTest.java", line 67, in jpype.exc.ExceptionTest.method2
Exception: Java Exception

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "testExc.py", line 4, in <module>
    ex.throwChain()
java.lang.RuntimeException: java.lang.RuntimeException: Inner

But if I tell it the PYTHONPATH includes test/harness/jpype/exc it can find the source files and prints this instead.

Traceback (most recent call last):
  File "ExceptionTest.java", line 57, in jpype.exc.ExceptionTest.throwChain
    method1();
  File "ExceptionTest.java", line 62, in jpype.exc.ExceptionTest.method1
    method2();
  File "ExceptionTest.java", line 67, in jpype.exc.ExceptionTest.method2
    throw new RuntimeException("Inner");
Exception: Java Exception

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "testExc.py", line 4, in <module>
    ex.throwChain()
java.lang.RuntimeException: java.lang.RuntimeException: Inner

See prints the java source code. So there would need to be two mods to the code to make this work.

1) We would need the exception handling routine to add the module path information so we get "jpype/exc/ExceptionTest.java" 2) We would need to patch the IPython search routine so that if it tries and fails to find a file with .java as the extension it would check if the JVM is running and call for the file using Classloader.getResource and use that as the source.

My plate is currently full so it will be a while before I can get to it.

Thrameos commented 4 years ago

One last note on this before drifting back to sleep....

The specific routines that produce the stacktrace info are native/java/org/jpype/JPypeContext.java:568:getStackTrace and native/common/jp_exception.cpp:534:PyTrace_FromJavaException. I believe the required alteration should just be on the Java side as we just need to alter line 595 in JPypeContext.java to change the output of the stacktrace to include paths.

I am not sure where the IPython processing routine is. The one for normal Python is in Python/traceback.c:_Py_DisplaySourceLine and Python/traceback.c:_Py_FindSourceFile which as far as I am aware has no customization hooks.