Timeout/watchdog potential improvement

rgerman commented 7 years ago

Hi @ivanfratric!

I have the following issue: the app I'm trying to fuzz takes a lot of time to initialize, but then the actual parsing (every iteration) takes way less (say parsing is 20 times faster than init).

If I set the timeout value as the time it takes to parse the input, then the dry-run fails since the testcase results in a hang. But if I use the initialization time, once the fuzzer finds a path that takes a lot of time, it won't be killed since the timeout is way too high (I hope the explanation of the problem makes sense).

Now, since we're running in persistent mode, it would be correct to assume that, most of the time, the initialization routine will only be executed once in a while. So, do you think it makes sense to setup the watchdog after receiving some kind of message from the pipe during the pre_fuzz_handler or something like that? That way, the timeout would apply only to the actual parsing.

Thanks for this great tool, btw! :)

0vercl0k commented 7 years ago

One easy way to work-around your issue is to comment this line: https://github.com/ivanfratric/winafl/blob/master/afl-fuzz.c#L2130 - this will basically disable the timeout between the process creation & the connection to the pipe server.

Cheers

2017-08-30 12:10 GMT-07:00 Germán Riera notifications@github.com:

Hi @ivanfratric https://github.com/ivanfratric!

I have the following issue: the app I'm trying to fuzz takes a lot of time to initialize, but then the actual parsing (every iteration) takes way less (say parsing is 20 times faster than init).

If I set the timeout value as the time it takes to parse the input, then the dry-run fails since the testcase results in a hang. But if I use the initialization time, once the fuzzer finds a path that takes a lot of time, it won't be killed since the timeout is way too high (I hope the explanation of the problem makes sense).

Now, since we're running in persistent mode, it would be correct to assume that, most of the time, the initialization routine will only be executed once in a while. So, do you think it makes sense to setup the watchdog after receiving some kind of message from the pipe during the pre_fuzz_handler or something like that? That way, the timeout would apply only to the actual parsing.

Thanks for this great tool, btw! :)

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/ivanfratric/winafl/issues/71, or mute the thread https://github.com/notifications/unsubscribe-auth/ABaHRdRH6bQIZnf0mahe_CpxFBTAQKM8ks5sdbO2gaJpZM4PH183 .

ivanfratric commented 7 years ago

@rgerman , what you're saying makes sense except note that in that case you wouldn't get information on timeouts for the initialization part and WinAFL would just hang if there ever is one (not sure what's the likelihood of that actually happening in practice assuming the user already tested everything under debug mode).

Not sure @0vercl0k's suggestion will work because the target process connects to the pipe as soon as it is successfully created (https://github.com/ivanfratric/winafl/blob/master/winafl.c#L787). The second timeout counter starts in https://github.com/ivanfratric/winafl/blob/master/afl-fuzz.c#L2313 right after afl-fuzz sends the message over pipe to the target process and I think WriteFile will return even before the target process actually starts reading from the pipe (before entering the target function).

Since we're discussing timeouts, there is also another issue here: Note that the first couple of fuzz iterations under DynamoRIO will be an order of magnitude slower than the iterations after DynamoRIO had the chance to cache all the relevant data. This is the reason why WinAFL uses persistence so heavily. However it also makes setting timeout value consistently difficult.

rgerman commented 7 years ago

Since we're discussing timeouts, there is also another issue here: Note that the first couple of fuzz iterations under DynamoRIO will be an order of magnitude slower than the iterations after DynamoRIO had the chance to cache all the relevant data. This is the reason why WinAFL uses persistence so heavily. However it also makes setting timeout value consistently difficult.

That's an interesting point I didn't think about.

For now, I just added a WriteFile in the pre_fuzz_handler, and a ReadFile just right before setting up the watchdog (https://github.com/ivanfratric/winafl/blob/master/afl-fuzz.c#L2312) and it seems to be working well for now.

But yeah, I understand that timeouts are a difficult issue here.

Anyway, thanks to both of you! WinAFL is awesome.

rgerman commented 7 years ago

@ivanfratric I guess I spoke too soon... I commented the line @0vercl0k mentioned and added the following code to afl-fuzz (after the WriteFile line and before the setting the watchdog_timeout_time variable):

if (ReadFile(pipe_handle, &result, 1, &num_read, NULL) == 0) PFATAL("Cannot read pre fuzz handler."); result = 0;

It was working well at first, but after a while, the ReadFile returned call returned a "no such file or directory" error. I notice the issue was easier to reproduce when the timeout was smaller, so I tried to debug it a little bit. What I believe it's happening is this:

1 - The run_target function creates the process and connect to the pipe. 2 - The watchdog detects a timeout and ends up calling the destroy_target_process function, but almost at the same time the target function returns and the post_fuzz_handler writes "K" in the pipe. So, the run_target function returns normally while the destroy_target_function is still being executed. 3 - The calibration process calls run_target again. At this point, if the application hasn't being killed yet (maybe because the nudge takes time to kill the process? I'm not a DynamoRio expert), then the is_child_running function will return true, and the flow goes directly to the WriteFile and ReadFile statements.

Then, I imagine that if destroy_target_function closes the pipe's handle and sets it to null, then both the WriteFile or ReadFile statements will fail (and the application won't be running either).

Do you think that's possible? If so, maybe I could block the execution of run_target before https://github.com/ivanfratric/winafl/blob/master/afl-fuzz.c#L2294 or something like that, to make sure that destroy_target_function already finished its execution.

Also, as a side note, if I want to set a timeout smaller than 1000 ms, I should change the Sleep call (https://github.com/ivanfratric/winafl/blob/master/afl-fuzz.c#L2251), right?

ivanfratric commented 7 years ago

If that is the case, one thing you can do is lock is_child_running with a critical_section (destroy_target_process already uses critical_section to prevent two instances of destroy_target_process triggering at once). This will prevent is_child_running executing at the same time as destroy_target_process. Hmm, actually it would probably be a good idea to do that anyway.

rgerman commented 7 years ago

Your suggestion seems to work well so far :)

Thank you!

ivanfratric commented 6 years ago

is_child_running in now protected by the critical_section

yoava333 commented 6 years ago

I'm having issues lately with very slow starting targets (30 seconds to start) but really fast iterations that have a very long timeout. @ivanfratric what do you think about a new switch -E that would basically be the default E xecution timeout after the first X (magic number) of iterations? A new execution would look like: afl-fuzz.exe ... -t 30000 -E 1000 ...

ivanfratric commented 6 years ago

I'm a bit on the fence about this, I can see the merits but, on the other hand, DR is slow whenever it encounters new coverage (because it needs to translate & cache it which happens at the time the code is first executed in the current process), so when setting timeouts too aggressively and the timeout occurs, you won't know if it's an actual timeout or you just hit a lot of previously unseen code.

yoava333 commented 6 years ago

I see your point, I'll revise my suggestion. My problem is that the load time until the first execution of target_function is part of the timeout, I would like a separate timeout for the loading until we reached the target_function for the first time & a timeout for the in-memory fuzzing part. What do you think?

ivanfratric commented 6 years ago

Sounds reasonable.

googleprojectzero / winafl

Timeout/watchdog potential improvement #71