JQF + Zest: Semantic Fuzzing for Java

JQF is a feedback-directed fuzz testing platform for Java (think: AFL/LibFuzzer but for JVM bytecode). JQF uses the abstraction of property-based testing, which makes it nice to write fuzz drivers as parameteric JUnit test methods. JQF is built on top of junit-quickcheck. JQF enables running junit-quickcheck style parameterized unit tests with the power of coverage-guided fuzzing algorithms such as Zest.

Zest is an algorithm that biases coverage-guided fuzzing towards producing semantically valid inputs; that is, inputs that satisfy structural and semantic properties while maximizing code coverage. Zest's goal is to find deep semantic bugs that cannot be found by conventional fuzzing tools, which mostly stress error-handling logic only. By default, JQF runs Zest via the simple command: mvn jqf:fuzz.

JQF is a modular framework, supporting the following pluggable fuzzing front-ends called guidances:

Binary fuzzing with AFL (tutorial)
Semantic fuzzing with Zest [ISSTA'19 paper] (tutorial 1) (tutorial 2)
Complexity fuzzing with PerfFuzz [ISSTA'18 paper]
Reinforcement learning with RLCheck (based on a fork of JQF) [ICSE'20 paper]
Mutation-analysis-guided fuzzing with Mu2 [ISSTA'23 paper]

JQF has been successful in discovering a number of bugs in widely used open-source software such as OpenJDK, Apache Maven and the Google Closure Compiler.

Zest Research Paper

To reference Zest in your research, we request you to cite our ISSTA'19 paper:

Rohan Padhye, Caroline Lemieux, Koushik Sen, Mike Papadakis, and Yves Le Traon. 2019. Semantic Fuzzing with Zest. In Proceedings of the 28th ACM SIGSOFT International Symposium on Software Testing and Analysis (ISSTA’19), July 15–19, 2019, Beijing, China. ACM, New York, NY, USA, 12 pages. https://doi.org/10.1145/3293882.3330576

JQF Tool Paper

If you are using the JQF framework to build new fuzzers, we request you to cite our ISSTA'19 tool paper as follows:

Rohan Padhye, Caroline Lemieux, and Koushik Sen. 2019. JQF: Coverage-Guided Property-Based Testing in Java. In Proceedings of the 28th ACM SIGSOFT International Symposium on Software Testing and Analysis (ISSTA ’19), July 15–19, 2019, Beijing, China. ACM, New York, NY, USA, 4 pages. https://doi.org/10.1145/3293882.3339002

Overview

What is structure-aware fuzzing?

Binary fuzzing tools like AFL and libFuzzer treat the input as a sequence of bytes. If the test program expects highly structured inputs, such as XML documents or JavaScript programs, then mutating byte-arrays often results in syntactically invalid inputs; the core of the test program remains untested.

Structure-aware fuzzing tools leverage domain-specific knowledge of the input format to produce inputs that are syntactically valid by construction. There are some nice articles on structure-aware fuzzing of C++ and Rust programs using libFuzzer.

What is generator-based fuzzing (QuickCheck)?

Structure-aware fuzzing tools need a way to understand the input structure. Some other tools use declarative specifications of the input format such as context-free grammars or protocol buffers. JQF uses QuickCheck's imperative approach for specifying the space of inputs: arbitrary generator programs whose job is to generate a single random input.

A Generator<T> provides a method for producing random instances of type T. For example, a generator for type Calendar returns randomly-generated Calendar objects. One can easily write generators for more complex types, such as XML documents, JavaScript programs, JVM class files, SQL queries, HTTP requests, and many more -- this is generator-based fuzzing. However, simply sampling random inputs of type T is not usually very effective, since the generator does not know if the inputs that it produces are any good.

What is semantic fuzzing (Zest)?

JQF supports the Zest algorithm, which uses code-coverage and input-validity feedback to bias a QuickCheck-style generator towards generating structured inputs that can reveal deep semantic bugs. JQF extracts code coverage using bytecode instrumentation, and input validity using JUnit's Assume API. An input is valid if no assumptions are violated.

Example

Here is a JUnit-Quickcheck test for checking a property of the PatriciaTrie class from Apache Commons Collections. The property tests that if a PatriciaTrie is initialized with an input JDK Map, and if the input map already contains a key, then that key should also exist in the newly constructed PatriciaTrie.

@RunWith(JQF.class)
public class PatriciaTrieTest {

    @Fuzz  /* The args to this method will be generated automatically by JQF */
    public void testMap2Trie(Map<String, Integer> map, String key) {
        // Key should exist in map
        assumeTrue(map.containsKey(key));   // the test is invalid if this predicate is not true

        // Create new trie with input `map`
        Trie trie = new PatriciaTrie(map);

        // The key should exist in the trie as well
        assertTrue(trie.containsKey(key));  // fails when map = {"x": 1, "x\0": 2} and key = "x"
    }
}

Running mvn jqf:fuzz causes JQF to invoke the testMap2Trie() method repeatedly with automatically generated values for map and key. After about 5 seconds on average (~5,000 inputs), JQF will report an assertion violation. It finds a bug in the implementation of PatriciaTrie that is unresolved as of v4.4. Random sampling of map and key values is unlikely to find the failing test case, which is a very special corner case (see the comments next to the assertion in the code above). JQF finds this violation easily using a coverage-guided called Zest. To run this example as a standalone Maven project, check out the jqf-zest-example repository.

In the above example, the generators for Map and String were synthesized automatically by JUnitQuickCheck. It is also possible to specify generators for structured inputs manually. See the tutorials below.

Documentation

The JQF Maven Plugin documentation shows how to run mvn jqf:fuzz and mvn jqf:repro.
Writing a JQF Test demonstrates the creation of a JUnit-based parameterized test method for JQF.
The Guidance interface docs show how JQF works internally, which is useful for researchers wishing to build custom guidance algorithms on top of JQF.
API docs are published at every major release, which is again useful for researchers wishing to extend JQF.

Tutorials

Zest 101: A basic tutorial for fuzzing a standalone toy program using command-line scripts. Walks through the process of writing a test driver and structured input generator for Calendar objects.
Fuzzing a compiler with Zest: A tutorial for fuzzing a non-trivial program -- the Google Closure Compiler -- using a generator for JavaScript programs. This tutorial makes use of the JQF Maven plugin.
Fuzzing with AFL: A tutorial for fuzzing a Java program that parses binary data, such as PNG image files, using the AFL binary fuzzing engine.
Fuzzing with ZestCLI: A tutorial of fuzzing a Java program with ZestCLI

Continuous Fuzzing

GitLab supports running JQF in CI/CD (tutorial), though they have recently rolled out their own custom Java fuzzer for this purpose.

Research and Tools based on JQF

Zest 🍝 [ISSTA'19 paper] - Semantic Fuzzing
BigFuzz 🍝 [ASE'20 paper] - Spark Fuzzing
MoFuzz [ASE'20 paper] - Model-driven software
RLCheck 🍝 [ICSE'20 paper] - Reinforcement learning
Bonsai 🍝 [ICSE'21 paper] - Concise test generation
Confetti [ICSE'22 paper] - Concolic / taint tracking with global hinting
BeDivFuzz [ICSE'22 paper]- Behaviorial diversity
ODDFuzz [IEEE S&P'23 paper] - Deserialization vulnerabilities
GCMiner [ICSE'23 paper] - Gadget chain mining
Intender [USENIX Security'23 paper] - Intent-based networking
Mu2 🍝 [ISSTA'23 paper] - Mutation testing as guidance
TOAST [JCST'22 paper] - Testing dynamic software updates
Poracle [TOSEM'23 paper] - Patch testing using differential fuzzing
SPIDER 🍝 [arxiv preprint] - Stateful performance issues in SDN
FuzzDiff [Dissertation] - Dynamic program equivalence checking

🍝 = Involves at least one of the original JQF authors.

Contact the developers

If you've found a bug in JQF or are having trouble getting JQF to work, please open an issue on the issue tracker. You can also use this platform to post feature requests.

If it's some sort of fuzzing emergency you can always send an email to the main developer: Rohan Padhye.

Trophies

If you find bugs with JQF and you comfortable with sharing, We would be happy to add them to this list. Please send a PR for README.md with a link to the bug/cve you found.

google/closure-compiler#2842: IllegalStateException in VarCheck: Unexpected variable
google/closure-compiler#2843: NullPointerException when using Arrow Functions in dead code
google/closure-compiler#3173: Algorithmic complexity / performance issue on fuzzed input
google/closure-compiler#3220: ExpressionDecomposer throws IllegalStateException: Object method calls can not be decomposed
JDK-8190332: PngReader throws NegativeArraySizeException when width is too large
JDK-8190511: PngReader throws OutOfMemoryError for very small malformed PNGs
JDK-8190512: PngReader throws undocumented IllegalArgumentException: "Empty Region" instead of IOException for malformed images with negative dimensions
JDK-8190997: PngReader throws NullPointerException when PLTE section is missing
JDK-8191023: PngReader throws NegativeArraySizeException in parse_tEXt_chunk when keyword length exceeeds chunk size
JDK-8191076: PngReader throws NegativeArraySizeException in parse_zTXt_chunk when keyword length exceeds chunk size
JDK-8191109: PngReader throws NegativeArraySizeException in parse_iCCP_chunk when keyword length exceeds chunk size
JDK-8191174: PngReader throws undocumented llegalArgumentException with message "Pixel stride times width must be <= scanline stride"
JDK-8191073: JpegImageReader throws IndexOutOfBoundsException when reading malformed header
JDK-8193444: SimpleDateFormat throws ArrayIndexOutOfBoundsException when format contains long sequences of unicode characters
JDK-8193877: DateTimeFormatterBuilder throws ClassCastException when using padding
mozilla/rhino#405: FAILED ASSERTION due to malformed destructuring syntax
mozilla/rhino#406: ClassCastException when compiling malformed destructuring expression
mozilla/rhino#407: java.lang.VerifyError in bytecode produced by CodeGen
mozilla/rhino#409: ArrayIndexOutOfBoundsException when parsing '<!-'
mozilla/rhino#410: NullPointerException in BodyCodeGen
COLLECTIONS-714: PatriciaTrie ignores trailing null characters in keys
COMPRESS-424: BZip2CompressorInputStream throws ArrayIndexOutOfBoundsException(s) when decompressing malformed input
LANG-1385: StringIndexOutOfBoundsException in NumberUtils.createNumber
CVE-2018-11771: Infinite Loop in Commons-Compress ZipArchiveInputStream (found by Tobias Ospelt)
MNG-6375 / plexus-utils#34: NullPointerException when pom.xml has incomplete XML tag
MNG-6374 / plexus-utils#35: ModelBuilder hangs with malformed pom.xml
MNG-6577 / plexus-utils#57: Uncaught IllegalArgumentException when parsing unicode entity ref
Bug 62655: Augment task: IllegalStateException when "id" attribute is missing
BCEL-303: AssertionViolatedException in Pass 3A Verification of invoke instructions
BCEL-307: ClassFormatException thrown in Pass 3A verification
BCEL-308: NullPointerException in Verifier Pass 3A
BCEL-309: NegativeArraySizeException when Code attribute length is negative
BCEL-310: ArrayIndexOutOfBounds in Verifier Pass 3A
BCEL-311: ClassCastException in Verifier Pass 2
BCEL-312: AssertionViolation: INTERNAL ERROR Please adapt StringRepresentation to deal with ConstantPackage in Verifier Pass 2
BCEL-313: ClassFormatException: Invalid signature: Ljava/lang/String)V in Verifier Pass 3A
CVE-2018-8036: Infinite Loop leading to OOM in PDFBox's AFMParser (found by Tobias Ospelt)
PDFBOX-4333: ClassCastException when loading PDF (found by Robin Schimpf)
PDFBOX-4338: ArrayIndexOutOfBoundsException in COSParser (found by Robin Schimpf)
PDFBOX-4339: NullPointerException in COSParser (found by Robin Schimpf)
CVE-2018-8017: Infinite Loop in IptcAnpaParser
CVE-2018-12418: Infinite Loop in junrar (found by Tobias Ospelt)
CVE-2019-17359: Attempt to trigger a large allocation leads to OOM in Bouncycastle ASN.1 parser (found by Tobias Ospelt)

rohanpadhye / JQF

readme