Whether there will be false positives in the new call sequence generated after the mutation of the graph.

Thank you very much for open-sourcing the tool !! In the process of learning Graphfuzz, I have a question to consult with you.

That is, is there a possibility of false positives in the newly generated call sequence after mutating the graph?

For example, the original call sequence is A->B->C. After mutating the graph by removing node B, the mutated call sequence becomes A->C. However, in a real project usage scenario, it is necessary to call B before calling C; otherwise, an error will occur. This error is due to improper API usage caused by the graph mutation, and it is not a result of an internal logic error in the tested function. In other words, it is a false positive.

Has Graphfuzz taken this situation into account and how does it handle this issue?

Once again, thank you for reaching out, and I anticipate our future discussions with enthusiasm.

Hi there @aT0ngMu, I'm not @hgarrereyn but maybe can help. If your API returns a specific error because it was not called correctly you may check for and handle it. See example here where the function graphfuzz_bail() is used to mark a false positive. At least that is how I understood and used that GraphFuzz API.

Hi @aT0ngMu thanks for reaching out! As @NikLeberg mentioned, you can use graphfuzz_bail to suppress false positives that are detected during execution. This feature is sort of undocumented but can be used if you are able to detect that a fatal error has occurred during runtime and you want to ignore it. However, in the case you are describing, there are some nicer ways to handle it, see below:

It is possible for GraphFuzz to generate false positives like you described. As an example API:

struct Foo {
    bool initialized;

    Foo() { initialized = false; }
    void setup() { initialized = true; }
    void bar() {
        assert(initialized);
        // do thing...
    }
};

Here it is required to call setup() before anything else.

So this is allowed:

Foo *f = new Foo();
f->setup();
f->bar();
delete f;

But this is not allowed:

Foo *f = new Foo();
f->bar();
delete f;

By default GraphFuzz does not know that setup() is required, however there are at least two ways to enforce this and prevent false positives:

1. Define a custom initialization routine:

If the object you are trying to use always requires invoking some setup method, you can directly enforce that by defining a custom constructor like the following:

Foo:
  type: struct
  name: Foo
  headers: [demo.h]
  methods:
  - Foo():
      outputs: ['Foo']
      exec: |
        $o0 = new Foo();
        $o0->setup();
  - void bar()

Here, the only valid way to construct a Foo also requires invoking Foo::setup.

2. Define pseudo-types

An alternative is to define some new pseudo-types that represent a Foo object in different states. In this case, for example you could do Foo_uninitialized and Foo_initialized to mark the two different states.

Foo_uninitialized:
  type: struct
  name: Foo_uninitialized
  headers: [demo.h]
  default_destructor: true
  methods:
  - Foo():
      outputs: ['Foo_uninitialized']
      exec: |
        $o0 = new Foo();
  - setup():
      inputs: ['Foo_uninitialized']
      outputs: ['Foo_initialized']
      exec: |
        $i0->setup();
        $o0 = $i0;

Foo_initialized:
  type: struct
  name: Foo_initialized
  headers: [demo.h]
  default_destructor: true
  methods:
  - void bar()

And include these typedefs in a header:

typedef Foo Foo_initialized;
typedef Foo Foo_uninitialized;

Here, the only way to call bar is on a Foo_initialized object. And that can only be constructed by first creating a Foo_uninitialized object and calling setup.

Hi @aT0ngMu thanks for reaching out! As @NikLeberg mentioned, you can use graphfuzz_bail to suppress false positives that are detected during execution. This feature is sort of undocumented but can be used if you are able to detect that a fatal error has occurred during runtime and you want to ignore it. However, in the case you are describing, there are some nicer ways to handle it, see below:

It is possible for GraphFuzz to generate false positives like you described. As an example API:
struct Foo {
    bool initialized;

    Foo() { initialized = false; }
    void setup() { initialized = true; }
    void bar() {
        assert(initialized);
        // do thing...
    }
};
Here it is required to call setup() before anything else.

So this is allowed:
Foo *f = new Foo();
f->setup();
f->bar();
delete f;
But this is not allowed:
Foo *f = new Foo();
f->bar();
delete f;
By default GraphFuzz does not know that setup() is required, however there are at least two ways to enforce this and prevent false positives:

1. Define a custom initialization routine:

If the object you are trying to use always requires invoking some setup method, you can directly enforce that by defining a custom constructor like the following:
Foo:
  type: struct
  name: Foo
  headers: [demo.h]
  methods:
  - Foo():
      outputs: ['Foo']
      exec: |
        $o0 = new Foo();
        $o0->setup();
  - void bar()
Here, the only valid way to construct a Foo also requires invoking Foo::setup.

2. Define pseudo-types

An alternative is to define some new pseudo-types that represent a Foo object in different states. In this case, for example you could do Foo_uninitialized and Foo_initialized to mark the two different states.
Foo_uninitialized:
  type: struct
  name: Foo_uninitialized
  headers: [demo.h]
  default_destructor: true
  methods:
  - Foo():
      outputs: ['Foo_uninitialized']
      exec: |
        $o0 = new Foo();
  - setup():
      inputs: ['Foo_uninitialized']
      outputs: ['Foo_initialized']
      exec: |
        $i0->setup();
        $o0 = $i0;

Foo_initialized:
  type: struct
  name: Foo_initialized
  headers: [demo.h]
  default_destructor: true
  methods:
  - void bar()
And include these typedefs in a header:
typedef Foo Foo_initialized;
typedef Foo Foo_uninitialized;
Here, the only way to call bar is on a Foo_initialized object. And that can only be constructed by first creating a Foo_uninitialized object and calling setup.

Thank you very much for your detailed response !!

However, I still have a question regarding the call sequences generated after graph mutation mentioned in the paper.

In the paper, the method is based on generating call sequences after graph mutation. I'd like to discuss with you the effectiveness of this method and whether the call sequences it generates are prone to a high rate of false positives.

I believe that graph mutation exhibits a certain degree of randomness, does this randomness potentially render the generated call sequences ineffective during fuzzing?

For example, libmpeg2 requires an allocated context that contains the current encoder/decoder configuration and buffer information. This context is passed to each subsequent library function. Graphfuzz can generate data flow graphs based on this code snippet, but when the graph undergoes mutation, is there a possibility of removing the function nodes represented by this context, leading to false positives in the generated call sequences during fuzzing?

Thank you once again for your detailed response, and I am looking forward to your reply.

It is worth clarifying that both mutation and generation obey the graph schema. So the graphs you obtain by mutation are the same types of graphs you could obtain via generation directly. The engine will never take a valid graph and turn it into an invalid graph by mutation.

Regarding false positives, it is possible for GraphFuzz to generate test cases that cause false positive crashes. However, this only happens when the schema is under-defined or under-constrained. In other words, GraphFuzz always obeys the schema during generation and mutation, so if it generates an "invalid graph" according to the API logic, this is a problem with the schema.

In many cases, it is possible to specify all the constraints you need in the schema (for example using the two strategies outlined in the previous comment). However, not every constraint can be specified easily in the schema. We talk a bit about some of these difficulties in section 6.2 of the paper.

If you have an example target you are trying to fuzz, I'd be happy to help discuss how to setup a graphfuzz harness.

It is worth clarifying that both mutation and generation obey the graph schema. So the graphs you obtain by mutation are the same types of graphs you could obtain via generation directly. The engine will never take a valid graph and turn it into an invalid graph by mutation.

Regarding false positives, it is possible for GraphFuzz to generate test cases that cause false positive crashes. However, this only happens when the schema is under-defined or under-constrained. In other words, GraphFuzz always obeys the schema during generation and mutation, so if it generates an "invalid graph" according to the API logic, this is a problem with the schema.

In many cases, it is possible to specify all the constraints you need in the schema (for example using the two strategies outlined in the previous comment). However, not every constraint can be specified easily in the schema. We talk a bit about some of these difficulties in section 6.2 of the paper.

If you have an example target you are trying to fuzz, I'd be happy to help discuss how to setup a graphfuzz harness.

Thank you for your patient explanations. I'm going to test some other target libraries now. GraphFuzz is a highly effective tool for fuzz testing libraries. Thanks again.

hgarrereyn / GraphFuzz