Add initial decompilation test proof of concept

Marcono1234 commented 2 years ago

Proof of concept for performing automatic decompilation tests. This should hopefully help with performing regression tests and to get some (indirect) test coverage. The code for the test could probably be improved and cleaned up a bit more, but it seems to work for this proof of concept.

Overview:

Test data (Java class files and JARs) is stored in a separate repository and included as Git submodule (for this proof of concept I used a branch of my cfr_tests clone). This has the following advantages:
- ... compared to including the test data in this repository: The test data is only relevant when you want to run the tests. However, when you only want to build the project you can skip the tests and avoid cloning the submodule. Though this is not a necessarily strong argument because Git also offers a partial clone feature, and the test data might not be that large. The disadvantages with the submodule approach are that a user might forget to clone the submodule (though the test will detect that), and that the test data and the corresponding expected data are separated, which might make maintenance more difficult.
- ... compared to publishing the test data as Maven artifact: Publishing Maven artifacts, even if only done here as GitHub package, might cause some impediment and is cumbersome. With a Git submodule you can just update the submodule to the latest commit and don't have to do anything special.
- ... compared to building the test data from source for each test run: Test data in binary form has the advantage that it should yield reproducible results, and that it avoids complicated setup and taking time to compile the source files.
A single JUnit 5 test class with two parameterized methods (one for class files, one for JARs) processes the test data and reads the expected data from this repository. This has the advantage that when the CFR output changes, the expected output can directly be changed here as well, without requiring any changes in a separate repository.
The expected Java output is represented by a .java file. Additionally the summary and exceptions output from CFR can be compared (if not empty), and custom CFR options can be provided through an .options file for the corresponding decompiled file.
The GitHub workflow runs the decompilation tests and in case of failures uploads the diff as workflow artifact. An example for this can be seen here where I intentionally added an .options file to enable renaming, but did not update the expected output file.

There are however still a few open questions / issues:

[ ] Which test data should be used? cfr_tests currently contains mostly only source files. Additionally some of the test cases exist multiple times in different class files and some test data contains large amount of classes. Maybe that is not so problematic, but ideally the test data should be as specific as possible to avoid that the tests run too long and impede the developer.
[x] How to handle cases where CFR outputs data which depends on the decompilation environment? For the summary messages I already added special casing to replace the CFR version number and file path in the output to avoid differences every time CFR is updated, or between OS'. However, the exception stack trace dumped by CFR's StructuredFakeDecompFailure depends highly on JDK and IDE. Maybe for now the solution is to not include test code which causes StructuredFakeDecompFailure to be dumped. Edit: Have added a CFR option for whether to dump exception stack traces; enabled by default.
[x] TODO (for myself): Add a system property to overwrite the expected data with the current CFR output to simplify updating the expected output for a CFR change which affected the majority of the test data.
[x] TODO (for myself): ? Maybe add support for "decompilation notes" in the expected Java source, e.g. to indicate that the current decompiled code is incorrect, or could be improved. These notes could be added as Java comments starting with for example //# and could be stripped away before the code is compared with the actual CFR output. This could cause line number mismatch for the generated diff files, but maybe that is acceptable.
[x] TODO (for myself): Add support for comment lines (e.g. starting with #) in the .options files

Any feedback is appreciated! (even if you don't want to integrate this or something similar)

Marcono1234 commented 2 years ago

@leibnitz27, what do you think about this approach? Do you have any requests for improvements or changes? The current test failure is 'intentional' to showcase how unexpected decompilation results would look like.

Before I address any of the open points in the description it would be good to get your general opinion on this idea.

leibnitz27 commented 2 years ago

Hey - sorry not been able to focus much on this project for a bit, trying to get back into it now; will go over this at the weekend - cheers!

leibnitz27 commented 2 years ago

first blush - this seems very cool \o/ - I will probably be able to answer these questions myself after reading your changes/commentary more thoroughly

why is expected output in the main repo - doesn't that naturally belong better in the submodule?

expected output seems to only have 1 version - it's pretty important to compile the increasing subsets of the tests on increasing versions of java (so you handle both the stringbuilder and the j9+ way of doing strings, for example).

Most environment data can be elided - for example the version number is currently dropped from output in existing crappy way of doing tests.. ;)

Mostly happy with the idea of checking binary class files in to avoid making tests more painful, but it does mean that a lot of spurious changes will occur frequently - I may have missed if there's a step to create these as part of a workflow? If so, how to do it once for each version of javac? (also not generally feasible I suspect as github is unlikely to expose old/crazy versions of javac).

more anon....

Marcono1234 commented 2 years ago

why is expected output in the main repo - doesn't that naturally belong better in the submodule?

There are the following reasons:

Changes to CFR might cause test failures due to different decompiled output. Having to first adapt it in the separate test resources repository, wait for it to be merged and then update the CFR pull request would slow down integration of changes.
The expected output is specific to CFR. If the separate test resources repository is intended to be decompiler agnostic, then that specific output should probably not be contained there.

expected output seems to only have 1 version - it's pretty important to compile the increasing subsets of the tests on increasing versions of java

With this pull request you would probably implement this by having version specific subdirectories in the test resources repository, e.g. classes/9/StringConcat.class. However, these .class files would have to be compiled manually, and should probably only created where they are interesting. E.g. when the way javac compiles specific code has not changed much or at all between versions, then there is probably no point in having the .class files for different versions because that would only slow down tests without adding much value.

I may have missed if there's a step to create these as part of a workflow?

No, currently I have no plans for creating the .class files in an automated way. However, the .class and .jar files would probably only be a subset of https://github.com/leibnitz27/cfr_tests (because there is some duplication in cfr_tests). This decompilation test approach is mainly intended to give some confidence that ones changes aren't breaking major parts of CFR, and to see their effects. It won't be a complete replacement for cfr_tests.

leibnitz27 commented 2 years ago

BTW I'm definitely sold on this, it's great.

Changes to CFR might cause test failures due to different decompiled output. Having to first adapt it in the separate test resources repository, wait for it to be merged and then update the CFR pull request would slow down integration of changes.

The expected output is specific to CFR. If the separate test resources repository is intended to be decompiler agnostic, then that specific output should probably not be contained there.

Yeah, both really good reasons.

However, these .class files would have to be compiled manually

At this point it feels like the cfr-tests repo should just get a compiled version of all the tests (pretty much the result of running the batch file in there atm); leaves the minor downside of churn as I said but... eh ;) \

No, currently I have no plans for creating the .class files in an automated way. ... It won't be a complete replacement for cfr_tests.

I'm a bit nervous about having a curated set of tests as the 'real' tests tend to end up being these, perhaps a curated exclusion list to ignore known (semi) pointless ones, or the horrifically expensive ones (the horrible exception test is nice, but always annoys me). (or maybe an inclusion list to start with, and flip to exclusion later).

I'll shove a full set of the compiled stuff onto cfr-tests master, then we can decide where we want to go - either you tweak this PR, or I cherry pick and tweak - up to you since you've done all the work ;)

Marcono1234 commented 2 years ago

Thanks for the feedback! I have implemented my TODO features listed in the description of this pull request and extended the documentation. It takes up the majority of the README now, so maybe it should be moved to a separate Markdown file instead?

For StructuredFakeDecompFailure I added a CFR option specifying whether to dump exception stack traces (dumpexceptionstacktrace); which is enabled by default but is disabled for these tests. Is that alright for you? Side note: This option might also be useful for different use cases, e.g. for projects which compile multiple versions of a JAR and diff the changes. For those projects avoiding diffs in the stack trace for decompilation failures when they change the way they invoke the CFR API (or when the call stack in CFR itself changes between versions) would be useful as well.

@ClassFileTestDataSource and @JarTestDataSource which are used in DecompilationTest now both support specifying glob patterns for class files or JARs to ignore. Hopefully that works fine for your intended usage. If not I can also try implementing an 'inclusion list' you proposed. Though in the end, an 'exclusion list' like the current one might be a bit less error-prone.

It looks like the Maven Surefire Plugin does not use the name value for the tests in the console output, which makes it difficult (or impossible) to tell for which class file or JAR a test failed. Not sure if that is a bug with the plugin, or if it has to be configured in some way. IDEs (at least Eclipse) show the test name just fine. Alternatively JUnit dynamic tests could be used, but I am not sure if changing the implementation is worth it at the moment.

Also note that the tests are compiled with Java 8. For Maven that seems to work without issues but IDEs might show spurious compilation errors because they think the test sources are compiled with Java 6 as well.

leibnitz27 commented 2 years ago

Hey - I've cherry picked your stuff here onto another branch and merged it with some small (and some less small) tweaks.

Because (you're right) ideally cfr-tests shouldn't be tied to cfr specifically ( in a wonderful ideal world, hah ), rather than drive which things get tested from the presence of a file, I've done it backwards, from a bit of (pretty badly implemented ;) ) xml config. (couldn't bring myself to add yaml here).

I've removed the jar stuff (for now) but will add back in, want to think about that a bit more in the presence of testing different module configurations.

Sorry for hacking your lovely code with grossness, at least with cherry picks it's clear which bits I mauled ;)

Marcono1234 commented 2 years ago

Sorry for hacking your lovely code with grossness, at least with cherry picks it's clear which bits I mauled ;)

It is fine; thank you for including it in the first place. Your XML test specification approach definitely works better for picking only specific classes to decompile. You might have to update the README though, it currently still refers to the separate .options file and does not mention the XML test specification file yet.

leibnitz27 / cfr

Add initial decompilation test proof of concept #288