Closed alexcrichton closed 1 year ago
I don't know if it's related to your issue, but the wasmtime fuzzers are very slow. In order to get a corpus with much better coverage (https://github.com/bytecodealliance/wasmtime-libfuzzer-corpus/pull/5) I had to fuzz the instantiate
fuzzer on 256 cores for weeks. There are also some very slow inputs in the current public OSS-Fuzz corpus for the instantiate
fuzzer that are particularly slow (e.g. take 20 seconds or so to complete. I didn't report this to you as I assumed you were already notified by OSS-Fuzz of this). Additionally I think the fuzzer running on a large corpus requires more memory than the default 2GB; if you don't set libFuzzer's -rss_limit_mb
to something higher than that, you will get out-of-memory crashes. At least the instantiate fuzzer would benefit greatly from less resource consumption.
I submitted https://github.com/bytecodealliance/wasmtime/pull/6330 which should fix the long test case issue you're seeing. That's a good point about -rss_limit_mb
though, thanks for the reminder! I'll try to rerun the oss-fuzz-built fuzzers over the oss-fuzz-generated-corpuses with that option and see if anything runs away with too much memory.
I'm unfortunately still perplexed as to what's going on here. One example is:
gs://wasmtime-logs.clusterfuzz-external.appspot.com/libFuzzer_wasmtime_instantiate/libfuzzer_asan_wasmtime/2023-05-03/15:08:34:611774.log
gs://wasmtime-logs.clusterfuzz-external.appspot.com/libFuzzer_wasmtime_instantiate/libfuzzer_asan_wasmtime/2023-05-03/15:08:34:611774.testcase
The log there says:
Component revisions (build r202305030624):
Wasmtime: 60ce6f5d5c5b7defc6f7a8bfaa6bfb8db82b0aa1
Regalloc2: 1b4287b6ad0e756a833a7ed5f62c3b475f0b1af0
Wasm-tools: 34216db2b321d38a67b42300034cdbdeaf780fa0
Bot name: oss-fuzz-linux-zone6-host-xd76-9
Return code: 1
Command: /mnt/scratch0/clusterfuzz/resources/platform/linux/unshare -c -n /mnt/scratch0/clusterfuzz/bot/builds/clusterfuzz-builds_wasmtime_9d7f296cb3c934976ab46f0ee760a3a07ef3344a/revisions/instantiate -timeout=25 -rss_limit_mb=2560 -fork=2 -artifact_prefix=/mnt/scratch0/clusterfuzz/bot/inputs/fuzzer-testcases/ -max_total_time=6200 -print_final_stats=1 /mnt/scratch0/clusterfuzz/bot/inputs/fuzzer-testcases-disk/temp-383/new /mnt/scratch0/clusterfuzz/bot/inputs/data-bundles/wasmtime_instantiate
Time ran: 0.03678083419799805
cf::fuzzing_strategies: fork:2
and the testcase listed is an empty file. This seems like the fuzzer might be crashing while it's loading the corpus perhaps. Again though above I come back to the text-parser fuzzer which is quite simple and runs quite fast but still has bad performance.
I also just remembered to check a different project I have on OSS-Fuzz, flate2-rs. That one has had no code changes in months and the April 3, 2023 report had it at 99.5 fuzzing_time_percent
where April 4, 2023 it dropped to 48.6. Similarly it has a number of blank startup crashes.
Is there a way I could inspect the build logs from these builds from a month or so ago? I'd like to compare versions of software/compilers/etc to try to see what changed and might be causing this.
@DavidKorczynski IIRC you've integrated a number of Rust projects into OSS-Fuzz historically, so I was wondering if you'd be able to check ot see if they're experiencing similar nonzero-startup_crash
-issues as well? for Wasmtime and flate2 I was seeing 30-40 startup_crash
percentages. Given that these are entirely unrelated projects and that flate2 has had no changes at all recently I'm curious if other Rust projects are similarly plauged as well.
I've tried to narrow this down to something that changed, for example cargo-fuzz
-related or rustc related, but I've so far been unsuccessful. I'm hoping though that if more unrelated projects are experiencing something similar if it's possible to draw connections and see what's going on?
@alexcrichton I'll take a look at this tomorrow!
@alexcrichton I went over three projects (serde_json
, image_png
, json5format
)and saw this behaviour on all three:
Component revisions (build r202305070613):
Image-png: 4579a7cdb568c6bf60c252a5c488a242cf713954
Bot name: oss-fuzz-linux-zone8-host-bhqx-14 Return code: 1
Command: /mnt/scratch0/clusterfuzz/resources/platform/linux/unshare -c -n /mnt/scratch0/clusterfuzz/bot/builds/clusterfuzz-builds_image-png_b0e58386fac39f658accbcc17c96e929fd304259/revisions/buf_independent -timeout=25 -rss_limit_mb=2560 -max_len=2358 -fork=2 -artifact_prefix=/mnt/scratch0/clusterfuzz/bot/inputs/fuzzer-testcases/ -max_total_time=6200 -print_final_stats=1 /mnt/scratch0/clusterfuzz/bot/inputs/fuzzer-testcases-disk/temp-383/new /mnt/scratch0/clusterfuzz/bot/inputs/data-bundles/image-png_buf_independent Time ran: 0.014983415603637695
cf::fuzzing_strategies: fork:2,random_max_len:1
Similar to @alexcrichton startup_crash_percent started happening on the April 4 for all of the projects. Screenshot of the `startup_crash_percent` column when looking at the history of individual fuzzers:
![Screenshot 2023-05-08 112035](https://user-images.githubusercontent.com/657617/236799844-d11f6054-571f-4ed1-83fd-24908ae4a44d.png)
I see `startup_crash_percent` is 0 on April 3, but ~20 April 4 and then goes up to ~35ish This is consistent amongst the three projects.
Most logs does not have much info, however, I got this one which reports which reports a Segmentation fault
:
Component revisions (build r202305070613):
Image-png: 4579a7cdb568c6bf60c252a5c488a242cf713954
Bot name: oss-fuzz-linux-zone8-host-bhqx-14
Return code: 1
Command: /mnt/scratch0/clusterfuzz/resources/platform/linux/unshare -c -n /mnt/scratch0/clusterfuzz/bot/builds/clusterfuzz-builds_image-png_b0e58386fac39f658accbcc17c96e929fd304259/revisions/decode -timeout=25 -rss_limit_mb=2560 -max_len=407 -fork=2 -artifact_prefix=/mnt/scratch0/clusterfuzz/bot/inputs/fuzzer-testcases/ -max_total_time=6200 -print_final_stats=1 /mnt/scratch0/clusterfuzz/bot/inputs/fuzzer-testcases-disk/temp-383/new /mnt/scratch0/clusterfuzz/bot/inputs/data-bundles/image-png_decode
Time ran: 12.281909227371216
INFO: Running with entropic power schedule (0xFF, 100).
INFO: Seed: 2880567337
INFO: Loaded 1 modules (38781 inline 8-bit counters): 38781 [0x563d871e3440, 0x563d871ecbbd),
INFO: Loaded 1 PC tables (38781 PCs): 38781 [0x563d871ecbc0,0x563d87284390),
INFO: -fork=2: fuzzing in separate process(s)
INFO: -fork=2: 2765 seed inputs, starting to fuzz in /tmp/libFuzzerTemp.FuzzWithFork19243.dir
#0: cov: 16371 ft: 16371 corp: 2765 exec/s 0 oom/timeout/crash: 0/0/0 time: 12s job: 1 dft_time: 0
INFO: log from the inner process:
Segmentation fault
INFO: exiting: 139 time: 12s
cf::fuzzing_strategies: fork:2,random_max_len:1
I see other similar logs:
Component revisions (build r202305070613):
Image-png: 4579a7cdb568c6bf60c252a5c488a242cf713954
Bot name: oss-fuzz-linux-zone6-host-c3cl-14
Return code: 1
Command: /mnt/scratch0/clusterfuzz/resources/platform/linux/unshare -c -n /mnt/scratch0/clusterfuzz/bot/builds/clusterfuzz-builds_image-png_b0e58386fac39f658accbcc17c96e929fd304259/revisions/decode -timeout=25 -rss_limit_mb=2560 -fork=2 -artifact_prefix=/mnt/scratch0/clusterfuzz/bot/inputs/fuzzer-testcases/ -max_total_time=6200 -print_final_stats=1 /mnt/scratch0/clusterfuzz/bot/inputs/fuzzer-testcases-disk/temp-383/new /mnt/scratch0/clusterfuzz/bot/inputs/data-bundles/image-png_decode
Time ran: 43.253541231155396
INFO: Running with entropic power schedule (0xFF, 100).
INFO: Seed: 1848597903
INFO: Loaded 1 modules (38781 inline 8-bit counters): 38781 [0x5fc9ea4ed440, 0x5fc9ea4f6bbd),
INFO: Loaded 1 PC tables (38781 PCs): 38781 [0x5fc9ea4f6bc0,0x5fc9ea58e390),
INFO: -fork=2: fuzzing in separate process(s)
INFO: -fork=2: 3350 seed inputs, starting to fuzz in /tmp/libFuzzerTemp.FuzzWithFork12249.dir
#610: cov: 20096 ft: 20096 corp: 3350 exec/s 305 oom/timeout/crash: 0/0/0 time: 42s job: 1 dft_time: 0
NEW_FUNC: 0x5fc9ea0cb5e0 (/mnt/scratch0/clusterfuzz/bot/builds/clusterfuzz-builds_image-png_b0e58386fac39f658accbcc17c96e929fd304259/revisions/decode+0x2105e0) (BuildId: f8e2ac9f3c1044d9909c1257f286bf95b49cfc97)
NEW_FUNC: 0x5fc9ea0cbfe0 (/mnt/scratch0/clusterfuzz/bot/builds/clusterfuzz-builds_image-png_b0e58386fac39f658accbcc17c96e929fd304259/revisions/decode+0x210fe0) (BuildId: f8e2ac9f3c1044d9909c1257f286bf95b49cfc97)
NEW_FUNC: 0x5fc9ea0d3f00 (/mnt/scratch0/clusterfuzz/bot/builds/clusterfuzz-builds_image-png_b0e58386fac39f658accbcc17c96e929fd304259/revisions/decode+0x218f00) (BuildId: f8e2ac9f3c1044d9909c1257f286bf95b49cfc97)
NEW_FUNC: 0x5fc9ea0dbd10 (/mnt/scratch0/clusterfuzz/bot/builds/clusterfuzz-builds_image-png_b0e58386fac39f658accbcc17c96e929fd304259/revisions/decode+0x220d10) (BuildId: f8e2ac9f3c1044d9909c1257f286bf95b49cfc97)
NEW_FUNC: 0x5fc9ea144ab0 (/mnt/scratch0/clusterfuzz/bot/builds/clusterfuzz-builds_image-png_b0e58386fac39f658accbcc17c96e929fd304259/revisions/decode+0x289ab0) (BuildId: f8e2ac9f3c1044d9909c1257f286bf95b49cfc97)
NEW_FUNC: 0x5fc9ea144af0 (/mnt/scratch0/clusterfuzz/bot/builds/clusterfuzz-builds_image-png_b0e58386fac39f658accbcc17c96e929fd304259/revisions/decode+0x289af0) (BuildId: f8e2ac9f3c1044d9909c1257f286bf95b49cfc97)
NEW_FUNC: 0x5fc9ea145e00 (/mnt/scratch0/clusterfuzz/bot/builds/clusterfuzz-builds_image-png_b0e58386fac39f658accbcc17c96e929fd304259/revisions/decode+0x28ae00) (BuildId: f8e2ac9f3c1044d9909c1257f286bf95b49cfc97)
NEW_FUNC: 0x5fc9ea146420 (/mnt/scratch0/clusterfuzz/bot/builds/clusterfuzz-builds_image-png_b0e58386fac39f658accbcc17c96e929fd304259/revisions/decode+0x28b420) (BuildId: f8e2ac9f3c1044d9909c1257f286bf95b49cfc97)
NEW_FUNC: 0x5fc9ea15fc50 (/mnt/scratch0/clusterfuzz/bot/builds/clusterfuzz-builds_image-png_b0e58386fac39f658accbcc17c96e929fd304259/revisions/decode+0x2a4c50) (BuildId: f8e2ac9f3c1044d9909c1257f286bf95b49cfc97)
NEW_FUNC: 0x5fc9ea230a10 (/mnt/scratch0/clusterfuzz/bot/builds/clusterfuzz-builds_image-png_b0e58386fac39f658accbcc17c96e929fd304259/revisions/decode+0x375a10) (BuildId: f8e2ac9f3c1044d9909c1257f286bf95b49cfc97)
NEW_FUNC: 0x5fc9ea2600c0 (/mnt/scratch0/clusterfuzz/bot/builds/clusterfuzz-builds_image-png_b0e58386fac39f658accbcc17c96e929fd304259/revisions/decode+0x3a50c0) (BuildId: f8e2ac9f3c1044d9909c1257f286bf95b49cfc97)
NEW_FUNC: 0x5fc9ea26ce80 (/mnt/scratch0/clusterfuzz/bot/builds/clusterfuzz-builds_image-png_b0e58386fac39f658accbcc17c96e929fd304259/revisions/decode+0x3b1e80) (BuildId: f8e2ac9f3c1044d9909c1257f286bf95b49cfc97)
NEW_FUNC: 0x5fc9ea270380 (/mnt/scratch0/clusterfuzz/bot/builds/clusterfuzz-builds_image-png_b0e58386fac39f658accbcc17c96e929fd304259/revisions/decode+0x3b5380) (BuildId: f8e2ac9f3c1044d9909c1257f286bf95b49cfc97)
NEW_FUNC: 0x5fc9ea2713b0 (/mnt/scratch0/clusterfuzz/bot/builds/clusterfuzz-builds_image-png_b0e58386fac39f658accbcc17c96e929fd304259/revisions/decode+0x3b63b0) (BuildId: f8e2ac9f3c1044d9909c1257f286bf95b49cfc97)
NEW_FUNC: 0x5fc9ea271d10 (/mnt/scratch0/clusterfuzz/bot/builds/clusterfuzz-builds_image-png_b0e58386fac39f658accbcc17c96e929fd304259/revisions/decode+0x3b6d10) (BuildId: f8e2ac9f3c1044d9909c1257f286bf95b49cfc97)
#1293: cov: 20201 ft: 20097 corp: 3351 exec/s 227 oom/timeout/crash: 0/0/0 time: 43s job: 2 dft_time: 0
#1293: cov: 20201 ft: 20097 corp: 3351 exec/s 0 oom/timeout/crash: 0/0/0 time: 43s job: 4 dft_time: 0
INFO: log from the inner process:
Segmentation fault
INFO: exiting: 139 time: 43s
cf::fuzzing_strategies: fork:2
Another strange error I came across with some more logging, that may be related since it happens early is image_png 2023-5-08 02:22:29:7930734)
Component revisions (build r202305070613):
Image-png: 4579a7cdb568c6bf60c252a5c488a242cf713954
Bot name: oss-fuzz-linux-zone8-host-bhqx-15
Return code: 1
Command: /mnt/scratch0/clusterfuzz/resources/platform/linux/unshare -c -n /mnt/scratch0/clusterfuzz/bot/builds/clusterfuzz-builds_image-png_b0e58386fac39f658accbcc17c96e929fd304259/revisions/decode -timeout=25 -rss_limit_mb=2560 -dict=/mnt/scratch0/clusterfuzz/bot/builds/clusterfuzz-builds_image-png_b0e58386fac39f658accbcc17c96e929fd304259/revisions/decode.dict.merged -artifact_prefix=/mnt/scratch0/clusterfuzz/bot/inputs/fuzzer-testcases/ -max_total_time=5700 -print_final_stats=1 /mnt/scratch0/clusterfuzz/bot/inputs/fuzzer-testcases-disk/temp-385/new /mnt/scratch0/clusterfuzz/bot/inputs/fuzzer-testcases-disk/temp-385/mutations /mnt/scratch0/clusterfuzz/bot/inputs/data-bundles/image-png_decode
Time ran: 0.019207000732421875
ParseDictionaryFile: error in line 3
"\001\322"
cf::fuzzing_strategies: corpus_mutations_radamsa:1,recommended_dict:1
Is there a way I could inspect the build logs from these builds from a month or so ago? I'd like to compare versions of software/compilers/etc to try to see what changed and might be causing this.
@jonathanmetzman @Alan32Liu do you know how to do this?
Thanks for confirming @DavidKorczynski! Sounds like Wasmtime itself is unlikely the culprit here and possibly something Rust-specific so far. Do you happen to have any non-Rust projects under OSS-Fuzz? If so, do they also show this startup crash phenomenon on April 4? If so then it may not be Rust-specific, and if not that'd be a good confirmation it's Rust-specific.
If so, do they also show this startup crash phenomenon on April 4?
I checked and non-Rust projects run fine -- as far as I can tell this is a Rust issue.
I think the easiest would be to get some of the build logs from those dates as you suggest above, and then investigate the differences.
Out of curiosity have you been able to reproduce any crashes locally? I've so far been unsuccessful unfortunately.
Out of curiosity have you been able to reproduce any crashes locally? I've so far been unsuccessful unfortunately.
I'm unable able to reproduce as well
Just an idea: OSS-Fuzz runs the fuzzers on a variety of CPUs with different capabilities (e.g. instructions like AVX-512 not supported by all CPUs). If the Rust compiler detects AVX-512 on the build system at compile time it might insert these instructions into the binaries, leading to crashes on a percentage of CPUs. Similarly the MOVBE
instruction will crash on certain CPUs. Maybe something changed in the Rust compiler with regards to this before and after April 4th.
Perhaps! Given that there's only one build a day I'm assuming that the same binary runs everywhere, though, meaning that any compile-time-feature detection would not work since it would auto-crash on machines without features. Runtime detection is of course still possible, though.
It'd be helpful to see the exact compiler verison used to produce the April 4 and April 3 binaries. I don't have that off-hand and the changes between nightly-2023-04-{03,04}
do not look suspicious to me. If https://github.com/google/oss-fuzz/pull/10270 I can try a bisection perhaps with a once-a-day roundtrip time :)
I don't have any easy way to get you old build logs
Ok https://github.com/google/oss-fuzz/pull/10270 didn't fix things for Wasmtime as it still has a nonzero startup crash percentage recorded for May 10 fuzzing yesterday. An example log is:
Component revisions (build r202305090625):
Wasmtime: 2300ef6894605e89bd94f6b2da75ba55e5ab91ac
Regalloc2: 1b4287b6ad0e756a833a7ed5f62c3b475f0b1af0
Wasm-tools: 16b82b2db210df9b6a7c0c354c0dc8528f700b43
Bot name: oss-fuzz-linux-zone6-host-xd76-11
Return code: 1
Command: /mnt/scratch0/clusterfuzz/resources/platform/linux/unshare -c -n /mnt/scratch0/clusterfuzz/bot/builds/clusterfuzz-builds_wasmtime_9d7f296cb3c934976ab46f0ee760a3a07ef3344a/revisions/wasm-tools-text-parser -timeout=25 -rss_limit_mb=2560 -fork=2 -artifact_prefix=/mnt/scratch0/clusterfuzz/bot/inputs/fuzzer-testcases/ -max_total_time=6500 -print_final_stats=1 /mnt/scratch0/clusterfuzz/bot/inputs/fuzzer-testcases-disk/temp-386/new /mnt/scratch0/clusterfuzz/bot/inputs/data-bundles/wasmtime_wasm-tools-text-parser
Time ran: 44.596296548843384
INFO: Running with entropic power schedule (0xFF, 100).
INFO: Seed: 223596020
INFO: Loaded 1 modules (52668 inline 8-bit counters): 52668 [0x5d2becb92a00, 0x5d2becb9f7bc),
INFO: Loaded 1 PC tables (52668 PCs): 52668 [0x5d2becb9f7c0,0x5d2becc6d380),
INFO: -fork=2: fuzzing in separate process(s)
INFO: -fork=2: 11516 seed inputs, starting to fuzz in /tmp/libFuzzerTemp.FuzzWithFork16003.dir
#0: cov: 83277 ft: 83277 corp: 11516 exec/s 0 oom/timeout/crash: 0/0/0 time: 44s job: 1 dft_time: 0
INFO: log from the inner process:
Segmentation fault
INFO: exiting: 139 time: 44s
cf::fuzzing_strategies: fork:2
where the reproduction case is empty.
Given that it seems like this probably wasn't caused by the Rust nightly being updated, and in the context of flate2-rs there were no other project changes in this time period as well.
@jonathanmetzman do you know if it's possible to see perhaps deployment records or logs or such for what changed between April 3 and April 4? Alternatively do you know of something else that can be done here to capture something like a core dump or get more logging as to what's happening? My guess now is that something about how Rust fuzzers are built needs to be updated to account for a recent change, but all the changes for projects between April 3 and April 4 I've looked at (here, clusterfuzz, llvm, etc) look unrelated.
I'm not sure what happened but as of the May 11 fuzzing data both Wasmtime and flate2-rs now have a 0% startup crash percentage. I suppose I'll close this if it persists for a day or so.
May 11 and onward fuzzing has a 0% startup crash rate for both flate2-rs and wasmtime. I'm not sure what changed, but I'll close this as it seems resolved.
Recently we've found a few bugs locally in Wasmtime and its related projects through fuzzing which felt like they should have been found on OSS-Fuzz much longer ago. I've been looking at some of the coverage reports and it looks like Wasmtime's fuzzers
fuzzing_time_percent
, which used to be 99+, is now <50 for almost all fuzzers. Taking a look at a simple fuzzer,libFuzzer_wasmtime_wasm-tools-text-parser
, itsperf_report
links say that 43% of runs are affected by astartup_crash
, but when looking at the logs it appears to be blank. For example 07:49:29:852659.log shows:Is there a way for us to dig into what the crash is and figure out what's going on here?
One wild guess is that Rust in the past few months upgraded from LLVM 15 to LLVM 16 which may be causing issues, but that's just a guess.