googleprojectzero / Jackalope

Binary, coverage-guided fuzzer for Windows, macOS, Linux and Android
Apache License 2.0
1.1k stars 128 forks source link

[Bugfix] Removing redundant and problematic sample saving code #48

Closed tsytsarkin closed 1 year ago

tsytsarkin commented 1 year ago

There is a bug in how downloading samples from a server is implemented.

First of all, saving the sample in client.cpp is redundant. The new samples received from the server are being returned back to fuzzer.cpp in server_samples. Then process sample jobs are created here: https://github.com/googleprojectzero/Jackalope/blob/69c4eee961102593cdf75bd81a7c99da805266e3/fuzzer.cpp#L666-L674

Then, Fuzzer::ProcessSample calls RunSample which analyzes the coverage, and saves the sample if it produces new coverage. (Fuzzer::ProcessSample saves the sample if it does not produce new coverage but add_all_inputs is specified) https://github.com/googleprojectzero/Jackalope/blob/69c4eee961102593cdf75bd81a7c99da805266e3/fuzzer.cpp#L766-L777

RunSample saves the sample here: https://github.com/googleprojectzero/Jackalope/blob/69c4eee961102593cdf75bd81a7c99da805266e3/fuzzer.cpp#L455-L469

Therefore, we don't need to save samples in client.cpp.

Not only this code is redundant, but there is also a serious problem which leads to samples being overwritten on disk. In both cases, num_samples is being used to generate the file name. But those are 2 different private variables in CoverageClient and Fuzzer classes. Those values are not synchronized. Fuzzer increments that variable when a new sample that improves the coverage is found, while CoverageClient increments it when it receives new coverage from the server. In both cases, the files are saved to the same directory.

Imagine the fuzzer finds a sample generating new coverage and saves it as sample_00001. Next, the server reports new coverage, and the client downloads the new sample from the server, which it also saves as sample_00001, overwriting the initial file. Then the client processes the new sample and (if it generates new coverage) saves the same sample as sample_00002.

Note, this change will make previous saved client states incompatible, due to num_samples being removed from the save file. It is probably not a big deal, but I can add code to handle that case if we want to keep it.

ifratric commented 1 year ago

Thanks for looking into Jackalope's code.

However, I believe in this case this is working as intended.

Client class does not save samples into the sample queue but rather to a different location (server_cache directory, see https://github.com/googleprojectzero/Jackalope/blob/main/client.cpp#L31) and for a different purpose. This only happens when keep_samples_in_memory is unset (typically used when samples are large and you don't want to keep them in memory). In that case, when the Client gets samples from Server, rather than keeping them in memory until they get picked up by a worker thread, Client drops the samples to disk to server_cache directory to free the memory. Note also that num_samples from the Client class is not the same as num_samples in the Fuzzer class. Thus sample saving in Client and in Fuzzer should not interfere in any way since these are separate mechanisms with separate directories and variables.

tsytsarkin commented 1 year ago

You are correct! Thank you for the detailed explanation.