rohanpadhye / JQF

JQF + Zest: Coverage-guided semantic fuzzing for Java.
BSD 2-Clause "Simplified" License
666 stars 112 forks source link

ZestGuidance does not correctly udpate total Coverage when test fails under every inputs #234

Open shuaiwang516 opened 1 year ago

shuaiwang516 commented 1 year ago

What happened:

When fuzzing a test that will fail under every input, ZestGuidance does not correctly compute the Total Coverage. To be more specific, the total coverage always shows as 0. Another issue caused by this is that valid coverage and total coverage are all the same value at any time. Because these two metrics are only updated under valid inputs.

How to reproduce:

Try to fuzz the following example test with Zest:

    @Fuzz
    public void exampleTest(String str) {
        int a = 0;
        a = a + 1;
        assertTrue(str.length() == -1);
    }

This test always fails because str.length() will not be -1. But the coverage should not be 0.

Example output:

Elapsed time:         3s (no time limit)
Number of executions: 44 (no trial limit)
Valid inputs:         0 (0.00%)
Cycles completed:     0
Unique failures:      2
Queue size:           0 (0 favored last cycle)
Current parent input: <seed>
Execution speed:      14/sec now | 11/sec overall
Total coverage:       0 branches (0.00% of map)
Valid coverage:       0 branches (0.00% of map)

How to improve

In ZestGuidance.java, there should have some totalCoverage update in this else if branch of handleResult() method: for example:

  } else if (result == Result.FAILURE || result == Result.TIMEOUT) {
      String msg = error.getMessage();

      // We should also increase the totalCoverage but not validCoverage
      totalCoverage.updateBits(runCoverage);
      int nonZeroAfter = totalCoverage.getNonZeroCount();
      if (nonZeroAfter > maxCoverage) {
          maxCoverage = nonZeroAfter;
      }
   ...
  }

Also because there is a coverage responsibility check in completeCycle(), so we need to check the saved input responsibility is the same as valid coverage there instead of the total coverage (because Zest does not save invalid input)

        if (sumResponsibilities != validCoverageCount) {
            if (multiThreaded) {
                infoLog("Warning: other threads are adding coverage between test executions");
            } else {
                throw new AssertionError("Responsibilty mismatch");
            }
        }
rohanpadhye commented 1 year ago

Thanks for the report! You are right that the total coverage numbers don't take into account failing tests. This is by design. If we update totalCoverage with new coverage X discovered during failing tests, then we won't have inputs that cover X that can also be used for mutation in the fuzzing queue. So, we only save non-failing inputs for new coverage.

I understand that this makes reporting a bit weird. We usually don't expect targets to have only failing inputs. One thing that we can possibly add is another map that tracks coverage of failing inputs only, and reports that to the status screen in another line.

moogician commented 1 year ago

Hi @rohanpadhye. I think there could be an elegant solution to this while make the reporting not that weird. The current coverage report can be very misleading when, e.g., all branch of the test in which all path would ultimately result in failures. In my understandings for this case, the total coverage should be at least non-zero if all the failures are triggered. However, the currently implementation of JQF would result in a very small coverage, which is counterintuitive.

As for the input selections, I assume that we could deal with the coverages triggered by failure inputs differently. For example, we could mark the coverages as "failure-induced" and have valid inputs "steal" them later if they also cover these coverages. Fixing this could give a more accurate coverage number and also prevent improper comparisons if we want to, e.g., compare between different tools & guidances.

rohanpadhye commented 1 year ago

Thanks for the input @hwang-pku. Marking specific parts of the coverage as "failure-induced" sounds a lot like keeping track of "coverage achieved by failure" and "coverage achieved by normal" separately, which I think is acceptable for reporting purposes. Feel free to open a PR for this, otherwise I will try to work on this in my free time.