google / oss-fuzz

OSS-Fuzz - continuous fuzzing for open source software.
https://google.github.io/oss-fuzz
Apache License 2.0
10.32k stars 2.2k forks source link

Make JavaScript corpora public #3972

Closed guidovranken closed 4 years ago

guidovranken commented 4 years ago

From what I've been reading, I understood that you are using a custom fuzzing construct for fuzzing JavaScript engines, like the projects spidermonkey and jsc. I cannot seem to access the corpora for these fuzzers though. They should normally be accessible at these URLs:

https://console.cloud.google.com/storage/browser/_details/spidermonkey-backup.clusterfuzz-external.appspot.com/corpus/libFuzzer/spidermonkey_js_fuzzer/public.zip https://console.cloud.google.com/storage/browser/_details/jsc-backup.clusterfuzz-external.appspot.com/corpus/libFuzzer/jsc_js_fuzzer/public.zip

But there is nothing there. Is it possible that you publish the (3 months old) corpora for these fuzzers? They would be invaluable as a seed corpus for separate JavaScript engine fuzzing efforts.

Thanks!

inferno-chromium commented 4 years ago

@guidovranken - the blackbox fuzzer corpora for js fuzzing is not reduced, and does not take code coverage into account. For blackbox fuzzers, we don't store corpus in GCS. Basically, there is a tests archive file (created from *.js in chrome, firefox, safari, etc repo) and the fuzzer just mutates those files there.

jonathanmetzman commented 4 years ago

I think this is a common source for seeds from v8 https://source.chromium.org/chromium/chromium/src/+/master:v8/test/mjsunit/

guidovranken commented 4 years ago

Thank you both. Tangential but are you running any project that fuzzes v8 with a default libFuzzer setup (with no restraint for generating incorrect grammer)?

I see the value in grammar-based fuzzers for JS, but a generic libFuzzer approach is not without merit entirely I think.

In fact, I've been fuzzing Spidermonkey based on a seed corpus of hundreds of thousands of JS and WASM files. This found a few minor bugs (including the DoS bug in V8 that I reported to Jonathan).

Is integrating this Spidermonkey fuzzer eligible for the integration bounty? It is libFuzzer based (using a modified version of https://github.com/mozilla/gecko-dev/blob/master/js/src/fuzz-tests/parsing-evaluate.js) and I will include the (minimized) corpus that I've built so far.

The current Spidermonkey project on OSS-Fuzz uses your blackbox fuzzer as far as I can tell.

There is also spidermonkey-ufi but as far as I can tell it does not perform eval() fuzzing.

Is the QuickJS project based on the blackbox fuzzer or not? If not, we can cross-pollinate if both parties (Mozilla/Bellard) consent to possibly give both projects a coverage boost.

Having a Spidermonkey eval() fuzzer not only benefits Spidermonkey but all JS engines, as we can periodically run each interpreter on the corpus. This is how I found the V8 bug.

My Spidermonkey corpus currently reaches a libFuzzer 'cov: ' value of 18435 (built with -fsanitize=fuzzer-no-link)

guidovranken commented 4 years ago

It has found a Spidermonkey memory bug now.

jonathanmetzman commented 4 years ago

Thank you both. Tangential but are you running any project that fuzzes v8 with a default libFuzzer setup (with no restraint for generating incorrect grammer)?

Yes. The fuzzer is called v8_fully_instrumented_fuzzer (source, bugs)

It is not very good.

I see the value in grammar-based fuzzers for JS, but a generic libFuzzer approach is not without merit entirely I think.

Maybe. But if I were to do this I'd probably start with Fuzzilli or something that has already done some work on coverage guided fuzzing of a JS engine. I think there's a lot of reasons why generic dumb-mutation coverage guided fuzzing won't work well on these engines (e.g. garbage collection/ non-determinism JITing etc).

In fact, I've been fuzzing Spidermonkey based on a seed corpus of hundreds of thousands of JS and WASM files. This found a few minor bugs (including the DoS bug in V8 that I reported to Jonathan).

There are some v8 fuzzers in chromium that fuzz WASM as well.

Is integrating this Spidermonkey fuzzer eligible for the integration bounty? It is libFuzzer based (using a modified version of https://github.com/mozilla/gecko-dev/blob/master/js/src/fuzz-tests/parsing-evaluate.js) and I will include the (minimized) corpus that I've built so far.

I'm not sure @inferno-chromium ?

The current Spidermonkey project on OSS-Fuzz uses your blackbox fuzzer as far as I can tell.

There is also spidermonkey-ufi but as far as I can tell it does not perform eval() fuzzing.

Is the QuickJS project based on the blackbox fuzzer or not? If not, we can cross-pollinate if both parties (Mozilla/Bellard) consent to possibly give both projects a coverage boost.

I don't believe it is. @mbarbella-chromium can different fuzzers easily opt-in to pollination now?

Having a Spidermonkey eval() fuzzer not only benefits Spidermonkey but all JS engines, as we can periodically run each interpreter on the corpus. This is how I found the V8 bug.

My Spidermonkey corpus currently reaches a libFuzzer 'cov: ' value of 18435 (built with -fsanitize=fuzzer-no-link)

Right, but how much of this is initialization or garbage collection/other things that are non-deterministic ?

guidovranken commented 4 years ago

Thank you for your insightful response Jonathan.

Thank you both. Tangential but are you running any project that fuzzes v8 with a default libFuzzer setup (with no restraint for generating incorrect grammer)?

Yes. The fuzzer is called v8_fully_instrumented_fuzzer (source, bugs)

Is this corpus public? 3 months old is fine. (Sorry, I'm new to browser fuzzing).

Having a Spidermonkey eval() fuzzer not only benefits Spidermonkey but all JS engines, as we can periodically run each interpreter on the corpus. This is how I found the V8 bug. My Spidermonkey corpus currently reaches a libFuzzer 'cov: ' value of 18435 (built with -fsanitize=fuzzer-no-link)

Right, but how much of this is initialization or garbage collection/other things that are non-deterministic ?

Difficult to tell precisely but I've observed only some 50 or so cov points variance across runs on the same corpus, and the coverage durably increases from consecutive runs.

It has found a few bugs now that the existing internal and external testing efforts (apparently) did not find. But if you don't want to do this that's fine and you can close this issue.