google / silifuzz

Apache License 2.0
380 stars 25 forks source link

Questions Size of Snapshot #5

Closed ChrisLaspias closed 1 year ago

ChrisLaspias commented 1 year ago

Hi silifuzz,

I use Silifuzz as part of my research and after walking a while through the codebase I wanted to ask you some questions.

  1. Is there a way to define the number of instructions that the snapshot will contain (e.g after fuzzing unicorn the number of x86 instructions that a snapshot will contain ) ? Is it possible to create bigger snapshots (e.g containing 1000 instructions)? The paper mentions : "Our typical snapshot contains less than 100 bytes of code and runs in microseconds, but it can be arbitrarily large."
  2. Is there a way to check the size of a snapshot and more specifically get the number of instructions executed? Thank you in advance
ksteuck commented 1 year ago

Hi @ChrisLaspias

  1. The sample proxies SiliFuzz provides do not have an option to define the exact number of instructions. Instead, there's an upper limit that is currently set to 100 for x86 and 4096 for aarch64. Both numbers are "more of a guideline" i.e. there's no guarantee this limit will ever be reached. Generally, snapshots can be arbitrary large both in terms of the number of dynamic instruction and in terms of how much memory they map. We have experimented with both larger and smaller snapshots, and found that smaller snapshots work better for our use case.
  2. You can use snap_tool trace command to produce the trace (currently, x86-only). This will execute the snapshot and print every instruction.
ChrisLaspias commented 1 year ago

Hi @ksteuck , Thank you for the quick answer. One last question. Is there a way to statically compile the snapshot into an executable (or with the help of snap_tool). I want to be able to run a snapshot without depending on dynamic linking.

Thank you for your time.

ksteuck commented 1 year ago

the answer depends on what you mean by "dynamic linking". The runner binary is always linked fully statically.

$ file bazel-bin/runner/reading_runner_main_nolibc
bazel-bin/runner/reading_runner_main_nolibc: ELF 64-bit LSB executable, x86-64, version 1 (SYSV), statically linked, BuildID[xxHash]=394d49c23eedd37a, not stripped

There are then two ways to load the corpus file into the runner. The default is to read corpus from a file (aka reading runner).

There's also an option to codegen Snapshots => C++ that can then be compiled into a self-contained (baked-in) runner binary. This latter option doesn't have good tooling around it and may be deprecated in the future. You can see an example of how this is done here. The TL;DR is genrule() => C++ => cc_library() => cc_binary()

HTH

ChrisLaspias commented 1 year ago

@ksteuck Thank you very much for helping. That was exactly what I was looking for.