unicorn-engine / unicorn

Unicorn CPU emulator framework (ARM, AArch64, M68K, Mips, Sparc, PowerPC, RiscV, S390x, TriCore, X86)
http://www.unicorn-engine.org
GNU General Public License v2.0
7.34k stars 1.31k forks source link

Rework the Java bindings #1833

Closed nneonneo closed 6 months ago

nneonneo commented 1 year ago

This is a complete rewrite of the Java bindings for Unicorn, focused on speed, correctness and feature parity.

The existing bindings have some significant shortcomings. An incomplete list:

This PR contributes a ground-up rewrite of the bindings. It brings the Java API up to par with Python feature-wise and substantially simplifies the hook implementation, enabling proper bounds-checked hooks.

The new implementation is much faster than the old one. As a point of comparison, the following code hook, executed 10,000,000 times on an x86 loop instruction, takes 9.9s with the old bindings, but just 1.6s in the new bindings - a 6x performance improvement. A similar hook takes 38s to run with the Python bindings.

    public static class MyCodeHook implements CodeHook {
        public long accum;
        public void hook(Unicorn uc, long address, int size, Object user_data) {
            // need to cast to int here b/c old bindings don't properly clear upper bits of value
            accum += (int)(long)uc.reg_read(Unicorn.UC_X86_REG_ECX);
        }
    }

The rewrite strives for compatibility with the previous API, but there are some breaking changes. It is possible to push closer to full backwards compatibility if required, at the cost of reintroducing some of the suboptimal designs. Here are the main points of breakage:

A lot of bugs are fixed with this implementation:

Several features are now enabled in the Java implementation:

Detailed list of backwards incompatible changes

nneonneo commented 1 year ago

By the way, if backwards-compatibility will be an issue, it should be possible to restore most of the old APIs. However, the resulting design may be less clean.

wtdcode commented 1 year ago

By the way, if backwards-compatibility will be an issue, it should be possible to restore most of the old APIs. However, the resulting design may be less clean.

Current breaking changes look good to me at least. I was not responding to this PR because of sickness recently. Will do a thorough check after a week or so.

nneonneo commented 1 year ago

OK, at this point I'm fairly happy with the code. It's decently well-tested, the samples match the C code (and the output matches too), and the performance is good. It's ready to be reviewed.

wtdcode commented 1 year ago

OK, at this point I'm fairly happy with the code. It's decently well-tested, the samples match the C code (and the output matches too), and the performance is good. It's ready to be reviewed.

Thanks for your work and I will review it these days as finally I feel a bit better! By the way, could you illustrate your use case?

nneonneo commented 1 year ago

I am planning to integrate Unicorn into some Ghidra-based analysis tools. As Ghidra is written entirely in Java, it is much more efficient to use Java bindings than trying to go through Jython-ctypes.

wtdcode commented 1 year ago

I am planning to integrate Unicorn into some Ghidra-based analysis tools. As Ghidra is written entirely in Java, it is much more efficient to use Java bindings than trying to go through Jython-ctypes.

That's cool! At this moment I don't have too much to say about this PR because I haven't written java for a few months. I need to give it a try before giving more review but thanks again for your brilliant work!

nneonneo commented 1 year ago

Any news on this?

wtdcode commented 1 year ago

Any news on this?

Rushing on a conference deadline ;(

Will have a look once getting it done.

wtdcode commented 1 year ago

All break changes look good to me as Java binding is too old and buggy.

I have no big questions on this PR and thanks for this brilliant work!

wtdcode commented 1 year ago

I will be soon on traveling so please expect another a few days absent. I will get back asap. Thanks for your patience. ;)

nneonneo commented 1 year ago

No worries. I'll aim to get something working with Maven this week, and also look at integrating the (newly merged) reg2 API :)

nneonneo commented 1 year ago

OK, the bindings are migrated to use Maven. The last thing on my TODO list is to switch to the reg2 API, which should be pretty straightforward.

nneonneo commented 1 year ago

OK, that last commit (763d041) adds the reg2 API, which I think is the last issue that needed to be resolved. Should be good for a final review and hopefully merge :)

wtdcode commented 1 year ago

Some last things which probably need your help:

  1. I read through this: https://stackoverflow.com/questions/2937406/how-to-bundle-a-native-library-and-a-jni-library-inside-a-jar and https://github.com/adamheinrich/native-utils. Since we plan to push the package to maven central, we would like to bundle dynamic libraries into jar and load it in the static code block of the Unicorn class. We prefer this approach for all bindings for ease of usage.
  2. javah seems outdated and could you replace it with javac -h? I'm not 100% sure on this so I'm glad if you could help.
  3. I'm thinking of removing Makefile totally so that we don't rely on make, which should enable it to compile on Windows. I see exec plugin allows us to execute any command and thus we can just compile unicorn libraries in pom.xml like other bindings did. This actually relates to my point 1 I believe.

At this moment, I'm experimenting with 1 and 3 to build dynamic libraries within pom.xml and bundle it to jar and load it in unicorn class. Could you have a look at 2?

nneonneo commented 1 year ago

@wtdcode OK, it was easy to switch to javac -h so I have done so. I also made a fix to const_generator to only generate new data if there are changes, which prevents mvn from having to rebuild everything every time.

nneonneo commented 1 year ago

Tests are failing, but I'm not sure it's my fault? The new test test_x86_0xff_lcall fails due to invalid instruction, which makes sense since the patch marks certain instructions as being invalid. Should the test itself be fixed?

wtdcode commented 1 year ago

Ah sorry you are correct, the test should be fixed. xd

wtdcode commented 1 year ago

I have mostly addressed the problem of building the binaries and now I'm investigating how to package everything together.

Meanwhile, I sent a request to Maven Central here:

https://issues.sonatype.org/browse/OSSRH-93144

nneonneo commented 6 months ago

@wtdcode its been a while! Just checking: is there any plan to get this merged? Maybe we can merge it first and then worry about the build issues? It sounded like the build system itself was basically sorted out and we just wanted to get the package on Maven (which I fully support!).

Merry Christmas and a happy new year!

wtdcode commented 6 months ago

@wtdcode its been a while! Just checking: is there any plan to get this merged? Maybe we can merge it first and then worry about the build issues? It sounded like the build system itself was basically sorted out and we just wanted to get the package on Maven (which I fully support!).

Merry Christmas and a happy new year!

Sure, I think that's a good start.

Regarding the build system, the reason why I failed to get it to work is that I lost all my progress (including some other work related to other issues) when I migrated my data, sorry.