Closed evverx closed 4 years ago
You are roughly getting 15K hours per day of fuzzing for systemd https://oss-fuzz.com/fuzzer-stats?group_by=by-day&date_start=2020-05-09&date_end=2020-05-21&fuzzer=afl&project=systemd https://oss-fuzz.com/fuzzer-stats?group_by=by-day&date_start=2020-05-09&date_end=2020-05-21&fuzzer=libFuzzer&project=systemd Has that changed ? Did you add new fuzz targets ? cpus are allocated per project, but maybe there is some new bug, will be investigated after the holidays.
According to those reports yesterday the systemd project got "51.4" (AFL) + "155.5" (libFuzzer) = 206.9 hours of fuzzing. I'm not sure where 15K comes from. Could it be that I'm looking at the wrong column?
As far as I can see, something happened on April 1.
AFL:
libFuzzer:
Did you add new fuzz targets ?
No, to the best of my knowledge nothing has changed on the systemd side.
FWIW I usually open issues like this when I end up discovering bugs like https://github.com/systemd/systemd/issues/15885 using my laptop much faster than OSS-Fuzz. Though OSS-Fuzz hasn't found those issues yet.
We would definitely want to investigate this furthur, but first want to understand the total hours, here is what i see.
I've never seen anything like it. Here's what I see:
Forgot to attach the libfuzzer stats:
@oliverchang - thoughts ? how can stats be different w.r.t user ?
Project filters don't actually affect the result for admins -- they filter the UI components next to them to indicate which fuzzers and job types can be selected. For external users, this automatically restricts results to the set of jobs they have access to.
It looks like something happened to the systemd bots on Mar 31 that caused them to hang due to an issue with the GCE metadata servers. + @mbarbella-chromium who is working on better monitoring here. In the meantime I'll restart these bots.
Thanks to @oliverchang who figured out root cause and fixed in https://github.com/google/clusterfuzz/pull/1799 , this was a pretty bad bug :(
Really appreciate @evverx as always for letting us know about these issues. You have been a very valuable contributor to OSS-Fuzz, thank you! Stats starting to get fixed with https://oss-fuzz.com/fuzzer-stats?group_by=by-day&date_start=2020-05-09&date_end=2020-05-31&fuzzer=libFuzzer_systemd_fuzz-unit-file
Will leave it open for a day or two before closing.
Seems all ok now.
Thanks @evverx. Please let us know if you notice this again. We've fixed a possible root cause but there may be another issue that has not been fully resolved yet.
Thank you! Judging by two bugs OSS-Fuzz has found since those jobs were restarted it seems to be back to normal. Though I'm curious as to why https://oss-fuzz.com/testcase-detail/5162542791131136 hasn't been reported on Monorail.
Other than that OSS-Fuzz opened https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=22706 (which is essentially a duplicate of https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=22547). I wonder how the backtrace got lost there (assuming it was the reason the bug wasn't deduplicated)
Other than that OSS-Fuzz opened https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=22706
I'm not sure why OSS-Fuzz keeps reporting the same bug over and over again. Today it opened https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=22726 (which is a duplicate of https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=22547). As far as I can tell, the backtrace wasn't lost there. @inferno-chromium I wonder why it wasn't deduplicated.
https://oss-fuzz.com/testcase-detail/5162542791131136 is reported after 4.5 hr, i remember we keep some 3 hour window atleast to make sure we get similar stacks and bundle them together. This was important for duplication.
For #3864 (comment), this is the fix - google/clusterfuzz#1803
Thank you!
https://oss-fuzz.com/testcase-detail/5162542791131136 is reported after 4.5 hr
Looks like I didn't notice "UTC" in the logs. At the time I thought that more than 7 hours had passed.
Regarding comment https://github.com/google/oss-fuzz/issues/3864#issuecomment-635442330, I have no clue. basically you have to try that testcase in an MSan build for systemd and see if it reproduces that way (like do you get proper stack or empty stack]. Are you getting good MSan stack testcase or is it empty all the time. Can you try running that fuzzer in msan build (without testcase) and see if it always crashes in a few minutes. Maybe that is some OOM. This can be a bug in MSan setup, OOM or something in MSan build, i have no clue and would need more debugging.
I tried to build that fuzz target with MSan and it always crashes with the same backtrace. That's basically why I was surprised that on ClusterFuzz the backtrace was lost somewhere and judging by the bug report it wasn't always reproducible.
The fuzzer itself is pretty solid in the sense that it's been running for about 10 minutes with no crashes with the seed corpus.
This backtrace loss is during fuzzing, not in reproduction. Unsure what is going on.
Since it doesn't seem to happen very often I think it's safe to assume it was just a glitch. If some other fuzz target crashes similarly it will probably make sense to take a look at it closer.
FWIW there was another weird crash with no backtrace (with UBsan this time): https://oss-fuzz.com/testcase-detail/5141092482940928
UndefinedBehaviorSanitizer:DEADLYSIGNAL
--
| ==1==ERROR: UndefinedBehaviorSanitizer: SEGV on unknown address 0x0000023c69a8 (pc 0x0000023c69a8 bp 0x7ffcb9e787e0 sp 0x7ffcb9e78778 T1)
| ==1==The signal is caused by a READ memory access.
| ==1==Hint: PC is at a non-executable region. Maybe a wild jump?
| #0 0x23c69a8 in [heap]
|
| UndefinedBehaviorSanitizer can not provide additional info.
| SUMMARY: UndefinedBehaviorSanitizer: SEGV ([heap]+0x1b99a8)
| ==1==ABORTING
FWIW there was another weird crash with no backtrace (with UBsan this time): https://oss-fuzz.com/testcase-detail/5141092482940928
UndefinedBehaviorSanitizer:DEADLYSIGNAL -- | ==1==ERROR: UndefinedBehaviorSanitizer: SEGV on unknown address 0x0000023c69a8 (pc 0x0000023c69a8 bp 0x7ffcb9e787e0 sp 0x7ffcb9e78778 T1) | ==1==The signal is caused by a READ memory access. | ==1==Hint: PC is at a non-executable region. Maybe a wild jump? | #0 0x23c69a8 in [heap] | | UndefinedBehaviorSanitizer can not provide additional info. | SUMMARY: UndefinedBehaviorSanitizer: SEGV ([heap]+0x1b99a8) | ==1==ABORTING
These unreproducibles ones from a long fuzzing session are hard to debug, only advice i have is to fix all open reproducible bugs from this fuzzer and see if this one stays (you can see crash statistics on testcase detail page). also, please file a new bug for furthur discussion.
only advice i have is to fix all open reproducible bugs from this fuzzer and see if this one stays
All the bugs have been fixed. The bug that I suspect caused this was fixed about 13 hours ago but it looks like OSS-Fuzz hasn't picked up that commit yet. It would be great if OSS-Fuzz could build projects more often than once a day.
also, please file a new bug for furthur discussion.
I'd open a new issue if I had anything I could describe to report :-)
only advice i have is to fix all open reproducible bugs from this fuzzer and see if this one stays
All the bugs have been fixed. The bug that I suspect caused this was fixed about 13 hours ago but it looks like OSS-Fuzz hasn't picked up that commit yet. It would be great if OSS-Fuzz could build projects more often than once a day.
OSS-Fuzz builder pipeline is planned for a complete rewrite in early Q3, and this feature is part of it. @oliverchang - fyi.
also, please file a new bug for furthur discussion.
I'd open a new issue if I had anything I could describe to report :-)
Judging by https://oss-fuzz.com/fuzzer-stats?group_by=by-day&date_start=2020-05-09&date_end=2020-05-21&fuzzer=libFuzzer_systemd_fuzz-unit-file, we're back to fuzzing for three hours a day every other day.
I'd reopen https://github.com/google/oss-fuzz/issues/3014 if I could.