Experiment with libFuzzer and fuzzing in general

kholia commented 7 years ago

I would like to fuzz the various hash extraction programs we have, to improve their robustness.

https://github.com/kholia/JohnTheRipper/tree/libFuzzer is a start. It has already found some crashes in keepass2john.c 👍

roycewilliams commented 7 years ago

If there was a +5 button, I would use it. This is great news!

solardiz commented 7 years ago

Thanks, Dhiru!

We should also get JtR into OSS-Fuzz. Google would even pay us for the integration effort: https://opensource.googleblog.com/2017/05/oss-fuzz-five-months-later-and.html

frank-dittrich commented 7 years ago

@solardiz When fuzzing JtR, I would expect timeout failures (and may be even out-of-memory failures) when fuzzing tunable cost parameters of various formats: https://github.com/google/oss-fuzz/blob/master/docs/faq.md#how-do-you-handle-timeouts-and-ooms

This may be hard to avoid. Similar problems occur if you try to fuzz JtR using afl: http://lcamtuf.coredump.cx/afl/

kholia commented 7 years ago

Thanks @roycewilliams and @solardiz!

I will look into how to get JtR into OSS-Fuzz very soon.

At the moment I am trying to fit libFuzzer into our build system. I believe this task is mostly done (+ documented) and I will send a pull request soon.

I am also writing the glue code required to fuzz various hash extraction utilities.

@magnumripper For doing fuzzing of various targets, it seems that we would need to convert the various hash extraction utilities into standalone programs (some of them were originally standalone I think!). Is there a good reason to not do this?

magnumripper commented 7 years ago

we would need to convert the various hash extraction utilities into standalone programs

As in not symlinked? How would that matter?

kholia commented 7 years ago

Please take a look at http://llvm.org/docs/LibFuzzer.html for details. In short, libFuzzer is an in-process fuzzer, and there can be only one LLVMFuzzerTestOneInput (fuzzer's entry point) function per executable (aka fuzzing target).

Making the various hash extraction utilities into standalone programs allows for their fuzzing with libFuzzer.

solardiz commented 7 years ago

Why can't we have only one LLVMFuzzerTestOneInput in the main john executable, yet also be able to fuzz the extraction utilities via their usual symlinks?

kholia commented 7 years ago

It can be done but doing it will require some "multiplexing" hacks in the LLVMFuzzerTestOneInput function. Such a function will have to decide whether to exercise keepass2john or dmg2john functionality, for example. I don't want to add such hacks to LLVMFuzzerTestOneInput if there is a cleaner solution available (i.e. make helpers standalone).

One another disadvantage of this approach is that LLVMFuzzerTestOneInput won't have access to the internal (static) functions of keepass2john and dmg2john, for example.

Overall, I see no disadvantage in making the helper utilities standalone. Making them standalone even improves their portability and re-use.

solardiz commented 7 years ago

"multiplexing" hacks in the LLVMFuzzerTestOneInput function. Such a function will have to decide whether to exercise keepass2john or dmg2john functionality, for example.

Why not use multiplexing code shared with what we already have in john.c? Move it into a new function and call that from two places (for normal vs. fuzzing invocations) if necessary.

Overall, I see no disadvantage in making the helper utilities standalone. Making them standalone even improves their portability and re-use.

I agree that our current approach might be out of date. It used to save space, especially with static linking, but this is rarely relevant now since jumbo became so large for other reasons anyway and since static linking is not that common. That said, if someone does produce a statically-linked build of jumbo, there will probably be a major increase in total size from the use of standalone utilities instead of symlinks.

Also, we always do our own static linking for our code shared between the programs (our object files that would need to be linked into the multiple standalone utilities).

kholia commented 7 years ago

It can be done (regarding multiplexing) but there is not much value in it anymore (like you said). There is much to gain by making the helpers standalone (portability, re-use, access to internal functions for fuzzing).

Overall, this seems like a perfect opportunity to remove this weird coupling mode for selected hash extraction programs.

solardiz commented 7 years ago

I'd like magnum's comments and decision on this. I still see some value in smaller static linking, and thus in the symlinks.

magnumripper commented 7 years ago

I think the symlink stuff is mostly outdated and obsolete but I never thought of moving away from it as it doesn't harm anything. Perhaps it does now.

Actually I'm more concerned with investing time and implementing lots of "useless" code for supporting some use-once-and-then-forget fuzzing lib into the main tree. The HAVE_FUZZ stuff is even worse, I regret its existence. I do understand they both can (and did) improve things but I hate it being there permanently. Same goes with Jim's MEMDBG stuff. It's good at what it does but it also breaks things every so often. As far as I can tell, most (if not all) of you disagree with me in this case so I'll just let you do whatever you like 😎

solardiz commented 7 years ago

I think it's valuable for us to have the fuzzing support code, but only as long as it's used regularly, not "use-once-and-then-forget". So that's what we should be doing. Dhiru, are you using the current builtin fuzzing functionality regularly? Please do. Are you going to use the libfuzzer support regularly rather than just once? Please do.

kholia commented 7 years ago

I intend to use the fuzzing functionality on a regular basis.

After seeing so many "good" results with libFuzzer recently, I am really motivated to improve the robustness of the code base.

roycewilliams commented 7 years ago

If fuzzing could be folded into rolling builds / Jenkins / CI, could that increase its value?

claudioandre-br commented 7 years ago

I was expecting a passionate discussion.

Same goes with Jim's MEMDBG stuff

I expected ASAN to deprecate all memdb stuff sooner of later (but, at this moment, they are complementary. memdbg finds bugs ASAN is missing).

If fuzzing could be folded into rolling builds / Jenkins / CI

I already run fuzz (afl and zzuf) inside CI on Travis[1]. Of course, it handles only a few formats ('mine' basically). However, it also stresses shared code, e.g., loader. We can't do real fuzz on CI (it takes too much time). Maybe Solar can install Docker and Jenkins (or whatever) in one of his servers. So it will be possible to run a real fuzz session once a week.

BTW: Not on rolling builds because mine rolling builds do not have debug stuff. Mine are meant to be able to run real stuff.

[1] at least, once a week.

kholia commented 7 years ago

Support for libFuzzer has been merged into the JtR jumbo code base. More than a dozen correctness, and robustness issues were found and fixed in various hash extraction utilities.

kholia commented 6 years ago

This experiment also found some security bugs in upstream software which we use in JtR jumbo,

openwall / john

Experiment with libFuzzer and fuzzing in general #2621