Closed kevina closed 4 years ago
No there is no name requirement. i see 16260 Aug 19 19:32 aspell_fuzzer_seed_corpus.zip in your gs://clusterfuzz-builds/aspell/aspell-address-201908200228.zip Is this the right seed corpus or getting archived properly ?
Also, check coverage locally with https://google.github.io/oss-fuzz/advanced-topics/code-coverage
I checked the seed corpus locally and the coverage was around 56%
That is the correct file, unzip -l aspell_fuzzer_seed_corpus.zip
gives:
Length Date Time Name
--------- ---------- ----- ----
132 2019-08-20 02:30 aspell_fuzzer_corpus/email000
108 2019-08-20 02:30 aspell_fuzzer_corpus/en_US-bad-spellers
114 2019-08-20 02:30 aspell_fuzzer_corpus/en_US-fast
116 2019-08-20 02:30 aspell_fuzzer_corpus/en_US-normal
114 2019-08-20 02:30 aspell_fuzzer_corpus/en_US-slow
115 2019-08-20 02:30 aspell_fuzzer_corpus/en_US-ultra
87 2019-08-20 02:30 aspell_fuzzer_corpus/en_us_input
86 2019-08-20 02:30 aspell_fuzzer_corpus/en_us_input_utf8
2213 2019-08-20 02:30 aspell_fuzzer_corpus/html000
65 2019-08-20 02:30 aspell_fuzzer_corpus/markdown001
...
--------- -------
7253 60 files
Should the files inside the zip be in there own directory?
when we unpack, we give it to libfuzzer/afl, which does not care directory structure. are you saying that coverage on fuzzer stats dash is lower than 60% ?
Yes, it currently at around 51% and coverage for aspell/modules/filter/markdown.cpp is 0%. if it was using the seed corpus that should be around 85%: https://storage.googleapis.com/oss-fuzz-coverage/aspell/reports/20190820/linux/src/aspell/modules/filter/report.html
This is a bit of mystery to me. Coverage build isn't broken and we seem to be unpacking the corpus based on the logs I see. Totally speculating: some other things we should look into
Actually, it looked like seed corpus was unpacked on the 17th, 18th, and 20th (when it started taking longer to unpack). So I'm predicting that the next coverage report that gets generated will cover code that your seed corpus covers. I'm not sure why it didn't unpack on the 19th.
The seed corpus is still rather small so I am sure why it would take so long to unpack. I give it another day then.
Is there a place I can look to tell if the seed corpus was unpacked?
The seed corpus is still rather small so I am sure why it would take so long to unpack. I give it another day then.
It isn't taking long, it's taking longer As in: 20th: 0.108174085617 seconds 18th: 0.000488996505737 17th: 0.000529050827026
There's nothing to worry about this, I was just pointing out for myself that it looks like a new seed corpus is being unpacked.
Is there a place I can look to tell if the seed corpus was unpacked?
You could download your project's corpus (use gsutil to download from gs://aspell-corpus.clusterfuzz-external.appspot.com) and do a coverage report on that (the names will be changed to sha hashes so you can't simply look for names).
I'm 99% sure that this has nothing to do with the names of the files. I think it's more likely that this problem was caused by something like pruning failing on the 19th, a problem that will go away.
You could download your project's corpus and do a coverage report on that (the names will be changed to sha hashes so you can't simply look for names).
Should the files from the seed corpus always be included when downloading the corpus via: gs://aspell-corpus.clusterfuzz-external.appspot.com/libFuzzer/aspell_fuzzer
or gs://clusterfuzz-builds/aspell/aspell-address-DATE.zip
?
Should the files from the seed corpus always be included
These are very different things let me explain.
gs://aspell-corpus.clusterfuzz-external.appspot.com/libFuzzer/aspell_fuzzer
should contain the working corpus (i.e. all of the files added during fuzzing plus the pruned corpus from the night before). You can actually get a copy of the backup we make after pruning here. We can't guarantee that it will contain all of the seeds since we remove redundant/reduced ones during pruning.
gs://clusterfuzz-builds/aspell/aspell-address-DATE.zip
contains the build which should include the seed corpus if you added it correctly. I think it was only brought up since there was a question about whether you added it correctly (you did).
I think the latest coverage report was generated at Tuesday 9 AM in UTC-4, that is before the seed corpus was unpacked (Tuesday 12:15 PM PDT).
The newest report shows coverage at ~56%
Markdown is still 0%. Are you sure it should be covered? If so I can try to take another look.
gs://clusterfuzz-builds/aspell/aspell-address-DATE.zip contains the build
Oops I meant gs://aspell-backup.clusterfuzz-external.appspot.com/corpus/libFuzzer/aspell_fuzzer/latest.zip
. Sorry. But you answered the question about what that corpus contains anyway.
Are you sure it should be covered? If so I can try to take another look.
Yes. The files named markdown001
to markdown050
in the seed corpus should test the markdown filter.
Here is what the coverage looks like when using just the seed corpus (python infra/helper.py coverage aspell --fuzz-target aspell_fuzzer --corpus-dir build/out/aspell/src/aspell-fuzz/aspell_fuzzer_corpus
)
Assigning to this week's sheriff.
Just to add another data point, I noticed that in the coverage report for 2019-08-21 email.cpp
was at 75% of lines coverage report for 2019-08-22 is was back down to 0%. There is some input in the seed corpus for the email filter but I think the fuzzier stumbled upon the setting string to activate it own its own.
After closer examination of the corpus I determined that the fuzzer did use the seed corpus (as it used pt_BR-001
which used the pr_BR
dictionary that uses features that en_US
does not);
however, it apparently found the input that uses the Markdown filter code uninteresting. There was also one input file for the Email filter, it used this for a day, but after that it also found it uninteresting after that.
it apparently found the input that uses the Markdown filter code uninteresting
I'm pretty sure there's a bug somewhere here (probably in CF), otherwise this input would be considered interesting and would be in the corpus. I did coverage reports locally and confirmed that 1. the seed corpus covers markdown.cpp
2. The working corpus does not cover markdown.cpp
. 3. The working corpus plus the seed covers markdown.cpp
. I bet if I copy the files from the seed corpus into the working corpus's cloud bucket it will start being covered.
. There was also one input file for the Email filter, it used this for a day, but after that it also found it uninteresting after that.
As in coverage went down for Email?
As in coverage went down for Email?
Yes. In the coverage report for 2019-08-21 email.cpp
was at 79% of lines covered and in the report for 2019-08-22 is was back down to 0%. I first thought the fuzzer stumbled upon the right settings on it's own (like with the TeX filter), but the coverage numbers for email.cpp
match exactly what they do when just using the seed corpus.
Email coverage on 2019-08-21:
Interesting, that the number of units in the corpus backup on 22nd was higher than on 21st:
Which is good and expected, but how could those files covering e.g. email.cpp file disappear...
Checked coverage job logs -- nothing suspicious in there.
I took a look at the recent corpus pruning logs but also don't see anything obviously wrong. Could there be any nondeterminism coming from the target?
Could there be any nondeterminism coming from the target?
There shouldn't be. If there is I would consider it a bug.
That's an interesting point. AFL reports only 25-40% stability https://oss-fuzz.com/fuzzer-stats/by-day/date-start/2019-08-15/date-end/2019-08-28/fuzzer/afl_aspell_fuzzer/job/afl_asan_aspell
AFL reports only 25-40% stability
@Dor1s are you trying to tell me my target in behaving nondeterministically?
If so, is there a way to find testcases that create different output when run multiple times?
@kevina I can't guarantee that's the case, but based on AFL logic for evaluating "stability" it does seem to recognize many parts of the target as non-deterministic :/
I'm trying a couple things locally, will get back to you if I realize anything useful.
https://github.com/ocaml/ocaml/issues/7612 indicates that caching might cause afl to report a target as unstable. Maybe the issue is the GlobalCacheBase?
The instability would have to be preventing Mardown.cpp from being reached deterministically.
I'm gonna test my theory that seed unpacking is broken (and not merging) by copying each file from the seed corpus into the working corpus. If the coverage improvements happen in the next report, then unpacking is broken.
I'm trying a couple things locally, will get back to you if I realize anything useful.
I was trying to do corpus minimization differently, but didn't notice anything suspicious.
Another theory: @kevina the dict/ directory in the build seems to control whether markdown.cpp is covered (I tried removing it and did a coverage report on the seed corpus, markdown.cpp is no longer covered). So if the coverage report tommorrow doesn't show markdown.cpp is covered (I explicitly added the seed corpus to the working corpus), then this is my best guess.
@jonathanmetzman the dict/
directory is required for any coverage. Without it Aspell won't find the needed data files (including the speller dictionary) and will return an error.
@jonathanmetzman the
dict/
directory is required for any coverage. Without it Aspell won't find the needed data files (including the speller dictionary) and will return an error.
OK, so tommorow we should see the coverage report containing coverage of markdown.cpp and we can see why our unpacking is broken
@jonathanmetzman the
dict/
directory is required for any coverage. Without it Aspell won't find the needed data files (including the speller dictionary) and will return an error.OK, so tommorow we should see the coverage report containing coverage of markdown.cpp and we can see why our unpacking is broken
New coverage report doesn't cover markdown.cpp
So something is up with pruning or this target behaves weirdly.
I'm seeing a similar issue with libxml2. I expanded the seed corpus of the xml
fuzzer two weeks ago, but the coverage report still shows quite a few code blocks as uncovered which really should be covered now.
@nwellnhof is that still the case? There was some regression in LLVM affecting code coverage tools (https://github.com/google/oss-fuzz/issues/4348).
Could you take another look at the stats and let us know if you still see that missing coverage?
If possible, please start a new issue for that (if the problem is still present).
Coverage looks good now.
Thanks for checking!
Should this thread be being closed? I don't think @kevina has responded saying the problem is fixed for aspell...
On Thu, 3 Sep 2020, 18:33 Max Moroz, notifications@github.com wrote:
Closed #2729 https://github.com/google/oss-fuzz/issues/2729.
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/google/oss-fuzz/issues/2729#event-3725829091, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAPA36MFB2G7PAMODXUYNADSD7HNDANCNFSM4IN3U6DQ .
@cmeister2 good catch! Last time aspell was discussed here over a year ago, and in the current reports I see markdown.cpp
file being covered: https://storage.googleapis.com/oss-fuzz-coverage/aspell/reports/20200909/linux/src/aspell/modules/filter/markdown.cpp.html
@kevina please comment / re-open if the issue still persists for you.
My project
aspell
does not seam to be using the seed corpus. About two days ago I expanded the seed corpus to improve coverage and yet coverage has not changed.The files are currently not named based on the sha1 checksum. Is this a requirement? The manual strongly hints at this when it says: