stability problem - Githubissues

zhouxucs commented 6 years ago

The stability of AFL is not 100% (lower than 20%) due to decoding TNT packets. If we only decode TIP packet, the stability is better but is still not 100%.

zhanggenex commented 6 years ago

What stability means and the possible causes for low stability in AFL

That last bit is actually fairly interesting: it measures the consistency of observed traces. If a program always behaves the same for the same input data, it will earn a score of 100%. When the value is lower but still shown in purple, the fuzzing process is unlikely to be negatively affected. If it goes into red, you may be in trouble, since AFL will have difficulty discerning between meaningful and "phantom" effects of tweaking the input file.

Now, most targets will just get a 100% score, but when you see lower figures, there are several things to look at:

The use of uninitialized memory in conjunction with some intrinsic sources of entropy in the tested binary. Harmless to AFL, but could be indicative of a security bug.
Attempts to manipulate persistent resources, such as left over temporary files or shared memory objects. This is usually harmless, but you may want to double-check to make sure the program isn't bailing out prematurely. Running out of disk space, SHM handles, or other global resources can trigger this, too.
Hitting some functionality that is actually designed to behave randomly. Generally harmless. For example, when fuzzing sqlite, an input like 'select random();' will trigger a variable execution path.
Multiple threads executing at once in semi-random order. This is harmless when the 'stability' metric stays over 90% or so, but can become an issue if not. Here's what to try:
- Use afl-clang-fast from llvm_mode/ - it uses a thread-local tracking model that is less prone to concurrency issues,
- See if the target can be compiled or run without threads. Common ./configure options include --without-threads, --disable-pthreads, or --disable-openmp.
- Replace pthreads with GNU Pth (https://www.gnu.org/software/pth/), which allows you to use a deterministic scheduler.
In persistent mode, minor drops in the "stability" metric can be normal, because not all the code behaves identically when re-entered; but major dips may signify that the code within __AFL_LOOP() is not behaving correctly on subsequent iterations (e.g., due to incomplete clean-up or reinitialization of the state) and that most of the fuzzing effort goes to waste.

The paths where variable behavior is detected are marked with a matching entry in the /queue/.state/variable_behavior/ directory, so you can look them up easily.

zhanggenex commented 6 years ago

How is stability calculated in the code?

In function calibrate_case(), run_target() is called.
first_trace[] and trace_bits[] is compared to update var_bytes[] and var_detected.
var_byte_count is calculated by count_bytes(var_bytes). This function counts the non-zero byte in var_bytes.
t_bytes = count_non_255_bytes(virgin_bits); This function counts the non-255 bytes in virgin_bits.
if t_bytes is not zero, stab_ratio = 100 - ((double)var_byte_count) * 100 / t_bytes; else, stab_ratio is set to 100%.

zhanggenex commented 6 years ago

The 1st known causes of low stability in ptfuzzer.

There are tip_fup packets in the packets recorded by PT. And tip_fup, along with several following packets, will cause unstable behaviors for the fuzzing process, thus resulting in low stability.

We found 2 patterns of tip_fup packets can be deleted from raw PT packets.

tip_fup: xxx only one tip_fup alone
tip_fup: xxx tip_pgd: 0 tip_pge: xxx 3 packets in total

And when these 2 patterns are deleted, we may get a higher stability in ptfuzzer.

zhouxucs commented 6 years ago

After applying the aforementioned patterns to filter fup packets, the traces still differs for two command line executions according to my test.

The first one is:

./ptest/readelf -a ./ptest/readelf

However, the traces differ in file length only. So we assume the buffer we use for storing PT packets if full.

This problem is confirmed. When we enlarge the perf aux buffer (_HF_PERF_AUX_SZ in pt.h) from 1M to 16M, and the problem is gone.

The second one is:

/bin/ls

However, the traces differs not in fup packets.

So we assume we write log files to the current directory while ls is running. It should be confirmed later.

zhanggenex commented 6 years ago

stability after filtering the two fup packets patterns in ptfuzzer We filtered the two above fup patterns in the decoding process of ptfuzzer, and had these two results:

when we exclude tnt packets in decoding, stability still goes down, but in a much slower speed compared to before, decreasing around 1-2% in 1 minute.
when we include tnt packets in decoding, stability goes down extremely in the first few minutes. And it also decreases around 1-2% in 1 minute later.

Anyway, stability still goes down after filtering the two fup patterns

However, something is interesting. in tip mode, when we exclude tip_pge and tip_pgd packets from the decoding process, stability is always 100% and ptfuzzer found several crashes in this situation The cause for this testing result still need to be discussed.

zhanggenex commented 6 years ago

Decoding psb packet By examining the debug information, we found that there aer still some undecoded tnt packets after decoding tip_pgd, which should not exist if the decoding is correct. Usually packets look like this:

tip_pge: 11111 tnt NN tnt TN tip_pgd: 22222

But when there are tip_fup packets generated, they look like this:

tip_pge: 11111 tnt NN psb tip_fup: 33333 tnt TN tip_pgd: 22222

And psb packet will reset some values to 0, and tnt packet between tip_pge and psb will not be processed, so it will cause low stability in ptfuzzer. After fixing this problem, stability can stay at around 95%, and AFL's doc claims that this number of stability, even though not 100%, won't have much impact on fuzzing.

zhouxucs commented 6 years ago

测试pandoc时发现了一种新的扰动模式：

last equal:  tip: 40910c
tip: 40910c             tip: 40910c
tip: 2b15354                tip_pgd: 0
tip: 40910c             tip_pge: 2afd270
tip: 2b15354                tip: 2afd279
tip: 40910c             tip: 2afd33b
tip: 2b15354                tip_pgd: 7ffff6a6e150
tip: 40910c             tip_pge: 407c41
tip: 2b15429                tip: 2b15354
tip: 4090c5             tip: 40910c
tip: 2b15429                tip: 2b15354

这种新的模型仍然由fup引起，上面fup包已经被删除，原始fup包应该在tip_gpd之前。与之前fup包不同的是，此处fup包引起了一个有效的pge，在这个pge中由两个tip包，即tip: 2afd279和tip: 2afd33b，按照此前的规则处理方式，这两个tip包是不能被忽略的，从而导致不稳定性。

从地址可以看出，此处多引入的两个tip包仍然在程序的有效地址范围之内，因此怀疑此处是程序对某种信号的处理代码，从而由系统的中断处理函数进入了程序自定义的信号处理函数。

通过设定地址过滤的upper limit，让这段信号处理代码不被记录，在fuzz初期stability达到了100%，但在半分钟之后降至69%，估计可能是因为其他原因引起。

zhouxucs commented 6 years ago

Pandoc is programmed using Haskell.

hunter-ht-2018 / ptfuzzer

stability problem #2