Automatic signalling for LibAFL

We are interested to use together LibAFL with SymRustC in a generic way, i.e. having a framework taking an arbitrary Rust program in input and doing the whole simulation as automatic as possible.

At first sight, the following setting seems to solve the problem: https://github.com/sfu-rsl/symrustc/blob/33b425d357fce8dc274e58cb89f38fb1335b4145/Dockerfile#L463 because here we are only specifying our Rust source source_0_original_1c_rs in input at a single location in the build phase.

However, for an arbitrary Rust program, this turns out to be not satisfying: during its main simulation loop, it seems it is mandatory for LibAFL to know how far the Rust program is progressing, while that program is in execution. In LibAFL, this progress information can be either implemented:

using some explicit signalling, as in: https://github.com/sfu-rsl/LibAFL/blob/59bb8e61856b22047f8e6e2787a3f6d90ae99006/fuzzers/baby_fuzzer/src/main.rs#L39
or using libafl_targets, as in: https://github.com/sfu-rsl/LibAFL/blob/59bb8e61856b22047f8e6e2787a3f6d90ae99006/fuzzers/libfuzzer_stb_image_concolic/fuzzer/build.rs#L38

In particular, whereas the above Rust source source_0_original_1c_rs is not duplicated elsewhere (thus, satisfying our genericity constraint), that code is currently not using explicit signalling, also not using libafl_targets. It then gets compiled by libafl_solving_build.sh: https://github.com/sfu-rsl/symrustc/blob/33b425d357fce8dc274e58cb89f38fb1335b4145/Dockerfile#L467 and the resulting binary is dynamically called afterwards by LibAFL: https://github.com/sfu-rsl/LibAFL/blob/59bb8e61856b22047f8e6e2787a3f6d90ae99006/fuzzers/libfuzzer_rust_concolic/fuzzer/src/main.rs#L189 https://github.com/sfu-rsl/LibAFL/blob/59bb8e61856b22047f8e6e2787a3f6d90ae99006/fuzzers/libfuzzer_rust_concolic/fuzzer/src/main.rs#L91 Note that, since it is instrumented by SymRustC, this binary may expect to be executed in a mode where the concolic run is disabled, as opposed to another different situation where the same binary expects to be executed in a mode where the concolic run is enabled: https://github.com/sfu-rsl/LibAFL/blob/59bb8e61856b22047f8e6e2787a3f6d90ae99006/fuzzers/libfuzzer_rust_concolic/fuzzer/src/main.rs#L321 (The content of target_symcc0.out is exactly: https://github.com/sfu-rsl/symrustc/blob/33b425d357fce8dc274e58cb89f38fb1335b4145/src/rs/libafl_solving_bin.sh#L12 In particular, it is internally calling target_symcc.out.)

To show that the explicit signalling solution can be straightforward to put in place (i.e. to show that the explicit signalling solution does not require significant knowledge in low-level Rust, C and LLVM programming), we provide another example called source_0_original_1c0_rs where we manually insert multiple signalling near multiple if then else of interests: https://github.com/sfu-rsl/LibAFL/blob/59bb8e61856b22047f8e6e2787a3f6d90ae99006/fuzzers/libfuzzer_rust_concolic_instance/fuzzer/src/main.rs#L217 Obviously, this solution is breaking our genericity requirement, since we had to duplicate that Rust code from: https://github.com/sfu-rsl/symrustc/blob/33b425d357fce8dc274e58cb89f38fb1335b4145/Dockerfile#L488 https://github.com/sfu-rsl/symrustc/blob/33b425d357fce8dc274e58cb89f38fb1335b4145/examples/source_0_original_1c0_rs/src/main.rs#L10

In this issue, we are interested to modify the way our original Rust example gets automatically compiled in libafl_solving_build.sh: https://github.com/sfu-rsl/symrustc/blob/33b425d357fce8dc274e58cb89f38fb1335b4145/Dockerfile#L492 so that the Rust source in input is automatically annotated with explicit signalling calls, or is automatically linked to take advantage of libafl_targets. (Here, any solutions should be fine as long as the code gets ultimately automatically generated.) In both solutions, one has to make sure that the automatic transformation does not alter the original concolic capacity of the binary, because the binary may be ultimately invoked by LibAFL in different concolic setting (respectively, when the concolic mode is on and off).

so that the Rust source in input is automatically annotated with explicit signalling calls, or is automatically linked to take advantage of libafl_targets

I believe that the latter approach is better. Because already a large set of sophisticated handling for different kinds of coverage is provided by the library. If we switch to explicit signaling we probably end up with a limited basic coverage reporting which may not work as well as the existing ones.

In this sense, I think we can take the advantage of the SanitizerCoverage which is already supported in libafl_targets. An alternative solution can be the code coverage instrumentation provided by the Rust compiler, which I don't think is currently supported by LibAFL. To achieve this, we need to perform the SanitizerCoveragePass during the compilation, before or after the SymCC symbolizer pass. Example (not in the context of SymRustC):

$ rustc -C llvm-args=--sanitizer-coverage-level=3 -C passes=sancov-module --emit=llvm-ir -o ./main.ll ./src/main.rs

This should give us a version of the binary which is instrumented by both SymRustC and the sanitizer coverage calls.

What remains is how to get the coverage report out of the binary.

One inevitable thing we need to do is linking libafl_target with the program, as the implementations of sanitizer coverage functions are provided there (see sancov_pcguard.rs). In the examples provided by them, the program (harness.c) is added as a dependency to the fuzzer program so they are linked together and work fine. However, in our case, we have to do a kind of reverse approach and put the libafl_targets library into the program. A promising solution can be adding a dependency to libafl_targets in the runtime library project (somewhere like here).

The second and more important challenge is to get the coverage report (the edge map) from the execution and give it to the observer. Again in their example, they do it simply by directly using the statically allocated array in the library as the target program is compiled into the fuzzer program and they are in the same memory space. In contrast, in our case, the binary is a separate process and the edge map will not be directly accessible. I guess the solution is to use the shared memory facilities. Inspired by the concolic observer in their example (see this and this), we need to create a shared memory in the fuzzer program, and later in the custom symcc runtime, overwrite the EDGES_MAP_PTR with it.

Example (not in the context of SymRustC):
$ rustc -C llvm-args=--sanitizer-coverage-level=3 -C passes=sancov-module --emit=llvm-ir -o ./main.ll ./src/main.rs
Thanks! After some experimentations, I can confirm that these options were indeed part of the missing puzzle pieces. At least, it allowed me to instrument the necessary if then else in a Rust source.

One inevitable thing we need to do is linking libafl_target with the program, as the implementations of sanitizer coverage functions are provided there (see sancov_pcguard.rs). In the examples provided by them, the program (harness.c) is added as a dependency to the fuzzer program so they are linked together and work fine.

The overall compilation architecture of our Rust examples to fuzz will indeed be dependent on that of LibAFL. If LibAFL were originally constructed as being directly modified from the Rust compiler (or from the SymRustC compiler), then it would have been in principle possible to compile our examples in a standalone fashion (e.g. while imagining LibAFL embedding part of its own libafl_target code in the future binary everytime it is providing an example to compile). Unfortunately --- or fortunately for other good reasons --- LibAFL is implemented as a library. We are here forced to create such an inverse dependency.

While actually thinking about the input space of Rust programs that we were initially targeting here, it might appear a bit ambitious to try doing the automatic signalling for an arbitrary Rust binary.

In the meantime, instead of a binary, we could start supporting Rust rlib libraries first: https://github.com/sfu-rsl/symrustc/commit/340905f89a30c2d1db56b5e197b7771e88b4eed3 This has the advantage of allowing our Rust examples to be compiled before the LibAFL main loop as rlib objects, and allowing them to be loaded within the same signalling memory space of their ultimate LibAFL loop.

Unfortunately, this will imply to redesign a little bit our Rust concolic examples provided as input. However, arguably, one might already notice that the input space of SymRustC is already restricted by the input set of programs that SymCC is supporting. For instance, a SymRustC program can only be concolic-executed when it is following the SymCC convention of using the precise SYMCC_INPUT_FILE protocol. And so, following here an additional rlib-architecture protocol for a library/program to work with LibAFL should perhaps be not too constraining (if not perhaps unavoidable in our case, due to how LibAFL has been designed).

In this example, we deactivate all explicit signalling: https://github.com/sfu-rsl/symrustc/blob/e7eae0a93706b113d3334584e74d168c9992854d/libfuzzer_rust_concolic_instance/fuzzer/src/main.rs#L206 Whereas the LibAFL loop and all dependencies are compiled a first time using this regular command: https://github.com/sfu-rsl/symrustc/blob/e7eae0a93706b113d3334584e74d168c9992854d/Dockerfile#L407 we explicitly focus on the harness function, and compile it again to be sanitized with libafl_targets: https://github.com/sfu-rsl/symrustc/blob/e7eae0a93706b113d3334584e74d168c9992854d/Dockerfile#L415 (Without loss of functionalities, this is actually an over approximation as it is the full LibAFL loop that gets sanitized.)

Regarding automation, a next step would be to see how we can embed the appropriate sanitizing information: https://github.com/sfu-rsl/symrustc/blob/e7eae0a93706b113d3334584e74d168c9992854d/Dockerfile#L421 inside a respective libfuzzer_rust_concolic_instance/fuzzer/build.rs...

this is actually an over approximation as it is the full LibAFL loop that gets sanitized.

I'm not sure about the internal dependency management of cargo, but maybe we can give the flags only to our harness library so libalf_targets won't be sanitized. Furthermore, there may be some other llvm flags that control which modules should be sanitized.

Regarding automation, a next step would be to see how we can embed the appropriate sanitizing information:

https://github.com/sfu-rsl/symrustc/blob/e7eae0a93706b113d3334584e74d168c9992854d/Dockerfile#L421

inside a respective libfuzzer_rust_concolic_instance/fuzzer/build.rs...

One can solve this problem by taking advantage of the incremental recompilation offered by cargo: https://github.com/sfu-rsl/symrustc/commit/734ef35019cd62a77943982059e1467a4332dedb

maybe we can give the flags only to our harness library so libalf_targets won't be sanitized

At the time of writing, we are using this trick to give the flags to the harness: https://github.com/sfu-rsl/symrustc/blob/653042a497cca4a8b5be5e0bed675779fc2de77c/Dockerfile#L440

In particular, this does not work if we insert an additional --target-dir ../target to that command: if we do so, some generated metadata will force cargo to recompile again the harness when building the fuzzing LibAFL binary (because the harness was originally set to be compiled without those flags). However, building the harness rlib elsewhere allows us to manually copy them later: https://github.com/sfu-rsl/symrustc/blob/653042a497cca4a8b5be5e0bed675779fc2de77c/Dockerfile#L441 Hopefully, the timestamps and content of those rlib are not tracked by cargo during its detection of packages to be potentially recompiled. Consequently, the harness does not get recompiled here (if it were, then it would be recompiled by default without the flags): https://github.com/sfu-rsl/symrustc/blob/653042a497cca4a8b5be5e0bed675779fc2de77c/Dockerfile#L442

For the future, we can consider direct rust instrumentation through -instrument-coverage which emits StatementKind::Coverage in MIR.

sfu-rsl / symrustc

Automatic signalling for LibAFL #5