better heuristics for detecting un-instrumented code than raw edge count

inferno-chromium commented 5 years ago

See https://github.com/WizardMac/ReadStat/issues/181#issuecomment-492603172 also, we have a bunch of hacks accumulating in https://github.com/google/oss-fuzz/blob/master/infra/base-images/base-runner/test_all#L121

evanmiller commented 5 years ago

Can you use sancov to count the number of covered functions?

https://clang.llvm.org/docs/SanitizerCoverage.html#default-implementation

jonathanmetzman commented 5 years ago

Can you use sancov to count the number of covered functions?

Wouldn't this suffer from a similar problem? That if we pick an arbitrary number, some legitimate cases will have less than that number? Maybe we can be smarter about the number by looking at binary size?

evanmiller commented 5 years ago

I was thinking in the case of counting functions, the threshold count would be 1 (or 2). But I don't understand coverage internals very well, and am grasping at straws here.

Dor1s commented 5 years ago

The real problem here with small fuzz targets is resource allocation. To give you an example:

#include <string>
#include <vector>

// Do some computations with 'str', return the result.
// This function contains a bug. Can you spot it?
size_t DoStuff(const std::string &str) {
  std::vector<int> Vec({0, 1, 2, 3, 4});
  size_t Idx = 0;
  if (str.size() > 5)
    Idx++;
  if (str.find("foo") != std::string::npos)
    Idx++;
  if (str.find("bar") != std::string::npos)
    Idx++;
  if (str.find("ouch") != std::string::npos)
    Idx++;
  if (str.find("omg") != std::string::npos)
    Idx++;
  return Vec[Idx];
}

// Simple fuzz target for DoStuff().
// See http://libfuzzer.info for details.
extern "C" int LLVMFuzzerTestOneInput(const uint8_t *data, size_t size) {
  std::string str(reinterpret_cast<const char *>(data), size);
  DoStuff(str);  // Disregard the output.
  return 0;
}

This fuzz target has ~73 coverage edges (no matter whether we grep or obtain that number by any other means, as @jonathanmetzman pointed out). The current threshold is 100.

I don't think we should allocate even one CPU-day for such a trivial fuzz target. If we could allocate fractional CPUs (e.g. run such a small fuzz target for just 1 hour a day), we might let them go through, but anyway that would be fuzzing done wrong.

Regarding hacky white listing in https://github.com/google/oss-fuzz/blob/master/infra/base-images/base-runner/test_all#L121, totally agree that we should get rid of it. I'm fine with removing / disabling those fuzz targets at all. At current scale of OSS-Fuzz, I think those minor and almost useless targets shouldn't distract us.

evanmiller commented 5 years ago

I agree that the marginal value of running a small fuzzing target for the 24th hour is very small. However, in terms of catching regressions, the marginal value of running the target on a new commit for even the first 60 seconds is quite high. These small targets may be sub-grammars buried deep in a complicated file format, and otherwise awkward to fuzz. Perhaps the scheduler should be more intelligent about prioritizing targets based on the marginal likelihood of finding a bug.

Dor1s commented 5 years ago

Absolutely. That's one of the key elements of our Ideal Integration guidance. The regression testing should happen upstream and/or be embedded into developer's workflow. From fuzzing you just need a good corpus. For a trivial fuzz target, a good corpus can be generated very quickly even without an infrastructure like ClusterFuzz.

evanmiller commented 5 years ago

That makes sense.

It would help me to have more clarity on what belongs in ClusterFuzz vs what belongs upstream. I was happy to include only the large, "full format" fuzzers until @Google-Autofuzz opened https://github.com/WizardMac/ReadStat/issues/181 requesting the smaller fuzzers be included. Now that I have spent time on those smaller fuzzers, I am being told they are a potential waste of resources.

As a project owner, it would help me if Google-Autofuzz and OSS-Fuzz came up with a coherent policy on acceptable fuzzing targets before making these sorts of requests of project owners. I don't really care which fuzzers are ultimately included, but I do find the current lack of clarity (and the unexplained urgency for including more fuzzing targets) a bit frustrating.

Dor1s commented 5 years ago

@evanmiller, I'm really sorry for that confusion. We're still adjusting some elements of collaboration with @Google-Autofuzz, so this situation we've put you in is definitely unfortunate.

Also, note that the point regarding a potential waste of resources is just my opinion, we still need to discuss it more with the team, and maybe we'll end up accepting small fuzz targets and lowering the requirements. I'd say stay tuned :)

google / oss-fuzz

better heuristics for detecting un-instrumented code than raw edge count #2469