steveicarus / iverilog

Icarus Verilog
https://steveicarus.github.io/iverilog/
GNU General Public License v2.0
2.84k stars 525 forks source link

OSSFuzz Integration #1075

Open capuanob opened 9 months ago

capuanob commented 9 months ago

Hello!

I am inquiring to see if the project maintainers would support me submitting this repository to Google's OSSFuzz project? I would develop the necessary build scripts and fuzz targets to integrate this project for nightly testing and would just need the eventual PR review to host the fuzz harnesses. Currently, I'm thinking that this project could benefit from fuzz-testing against its verilog preprocessor and parser to find potential edge-cases that result in undefined behavior. I am open to any other entry-point suggestions, as well!

Thank you!

larsclausen commented 9 months ago

I think this is a great idea. I've been thinking in the back of my head about this for a while, how to generate inputs that will stress corner cases of the parser and elaboration. I think there are lots of bugs to be found.

capuanob commented 9 months ago

@larsclausen Fantastic! Excited to get started, I just now need to submit this repository for consideration to the OSSFuzz team. To do that, I will need an email address to serve as the primary contact for the repo (preferably a Google mail account). This email will be given access to the ClusterFuzz dashboard to review bugs and receive crashing testsuites.

steveicarus commented 9 months ago

You can use my email address, steve@icarus.com. e-mail is not the best way to communicate with me, and the team here communicates well through github, but if you need a placeholder, I am the boss of this mess, so it might as well be me.

caryr commented 9 months ago

This seems like a good thing, but I have a couple thoughts:

In general buffer overflows, etc. are not too catastrophic since a simulator should never be run using elevated privileges and we have not implemented the $system() task so we are likely just limited to overwriting files.

I doubt it understand SV so will we get useful results?

Do we have time to deal and fix any issues before the 90 day issue report is released?

I think they want a gmail email address.

capuanob commented 8 months ago

@caryr A gmail address is definitely preferable, but not a necessity

In regards to understanding SV, the fuzzing engines will make "intelligent" mutations over time. Beginning with a valid verilog input, the fuzzing engine will utilize branch instrumentation to identify how to augment its input to achieve breadth across the many branches and code blocks in the parser. So, while it doesn't inherently understand the verilog grammar, it will slowly build "valid" and unique verilog inputs to exercise edge cases.

caryr commented 8 months ago

That sounds good.

I think one thing we could really use is someone to get the grammar part of the language fully implemented and then report what is not currently implemented instead of reporting syntax errors for things that are actually perfectly valid SystemVerilog. This tool seems like something that could help with this. Does the fuzzing engine also find statements that are accepted, but are actually invalid? I believe there are some things that while grammatically correct they may not be valid given the context. Adding full parsing would be a great benefit to our users and does not require learning the internals of Icarus which is a daunting endeavor.

Unfortunately, we are in a time when most, and maybe all, of the main developers are busy with our day jobs/school so we do not have significant time to contribute to Icarus. Is this something you would find interesting to help us with? We can provide resource references, comments, suggestions, etc.

larsclausen commented 8 months ago

I believe how the tool works is it will generate inputs that will try to discover undefined behavior in the application. It doesn't really bother whether its valid or invalid Verilog. So it wont really help with finding cases where the implementation does not follow the spec.

martinwhitaker commented 8 months ago

I did start trying to extend the parser to cover the full SV syntax a couple of years ago, but hit shift/reduce conflicts that I couldn't resolve.

caryr commented 8 months ago

I did start trying to extend the parser to cover the full SV syntax a couple of years ago, but hit shift/reduce conflicts that I couldn't resolve.

Do you believe this is a fundamental limitation of flex/bison and the grammar or is this something that could be resolved with a more radical rework of the existing code? If it's the former it seems like we need to start looking at alternate parsers/lexers to make progress. I've always been impressed with the work Mike is doing on slang, but I believe he regularly refactors his code to use the latest C++ constructs available when new versions of gcc/clang are released and that's incompatible with our goal of having Icarus compile on older systems using the compilers provided when they were released.

martinwhitaker commented 8 months ago

Verilog-Perl and Verilator (both by Wilson Snyder) use flex/bison, and the former claims to support most SystemVerilog syntax. The problem is that SystemVerilog does not have a context-free grammar, so is not well suited to bison. I did look at the Verilog-Perl parser to see how it did it, but it's written in a meta-language that is pre-processed to produce the bison input, so it's a bit obscure. Agree with your comments on slang, both the positives and the negatives.

svenka3 commented 8 months ago

Interesting discussion, so adding a slightly orthogonal choice to handling unsupported features especially given that slang was mentioned. How about a lightweight slang based linter to check for compatibility with Icarus? For instance, a "virtual interface" is not supported (just an example)? Given my recent work on a SV testbench linter on top of PySlang this is doable. Challenge will be to get a list of unsupported features. Of course it is a two step process, not ideal and has the dependency issues on latest gcc etc. as mentioned above.

Thanks

abradd commented 6 months ago

Verilog-Perl and Verilator (both by Wilson Snyder) use flex/bison, and the former claims to support most SystemVerilog syntax. The problem is that SystemVerilog does not have a context-free grammar, so is not well suited to bison.

Verible uses flex/bison. Looks like they modify tokens to allow them to work with a Bison-generated parser.

caryr commented 3 weeks ago

So it looks like iverilog was integrated into OSSFuzz. Does this report still need to be open?

@steveicarus I assume since you are the primary contact for this you are handling any issues that come up, but if that is not the case you should likely contact @martinwhitaker, @larsclausen or myself to look at issues outside the normal bug reporting infrastructure.

capuanob commented 3 weeks ago

Hi Cary,

While the proposal for IVerilog being a candidate for OSS-Fuzz integration was approved, I have yet had the chance to integrate the project.

I remember having difficulty compiling the project as a shared object instead of an executable, is there any built in support for this or should I keep exploring custom compilation paths?

Further, I’d love to hear any suggestions for entry points for initial fuzz testing if anyone has any.

Thank You, Bailey Capuano

On Thu, Sep 26, 2024 at 10:56 AM Cary R. @.***> wrote:

So it looks like iverilog was integrated into OSSFuzz. Does this report still need to be open?

@steveicarus https://github.com/steveicarus I assume since you are the primary contact for this you are handling any issues that come up, but if that is not the case you should likely contact @martinwhitaker https://github.com/martinwhitaker, @larsclausen https://github.com/larsclausen or myself to look at issues outside the normal bug reporting infrastructure.

— Reply to this email directly, view it on GitHub https://github.com/steveicarus/iverilog/issues/1075#issuecomment-2377595181, or unsubscribe https://github.com/notifications/unsubscribe-auth/AHXFHTJTSME5XX5FEE2NBDDZYRDDDAVCNFSM6AAAAABO5RNHY6VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDGNZXGU4TKMJYGE . You are receiving this because you authored the thread.Message ID: @.***>

caryr commented 3 weeks ago

Recently changes were added to generate vvp as a shared library. Can you provide either a short description of the requirements to get OSS-Fuzz working for a specific project or a link to the summary so wecan determine what is possible given the current compilation flows.

capuanob commented 1 week ago

@caryr Apologies for the delay, the process is generally:

  1. Identify an entry-point into the library that would be a good candidate for fuzz-testing, typically somewhere that relies on some form of user input.
  2. Write a LibFuzzer harness that tests the entrypoint
  3. Introduce a compilation flag into the build system to support an instrumented build (ASAN, UBsan, etc.). This build relies on compilation flags that are provided by ClusterFuzz through a variety of environment variables.
  4. Commit a build.sh script that builds the harness executable and moves it to the expected output directory. This will be run from within a Docker container kicked off by Google's ClusterFuzz project.

If VVP can be built as a shared library and would benefit from fuzz-testing, that would be a great candidate

martinwhitaker commented 3 days ago

Building vvp as a shared library just allows it to be embedded in another application. The library API only provides functions to pass in options and file names that would normally be supplied on the command line. The main input to vvp remains a text file containing instructions that vvp interprets.

vvp assumes that it is processing a file that is generated by the compiler (and so can normally be expected to be valid). As it uses a flex/bison-generated scanner and parser, it will generally handle syntax errors tersely but reasonably gracefully, but semantic errors will mostly be caught by assertions (if at all). Fuzzing is almost certain to find many illegal or unsupported input sequences that will just cause an assertion failure. I would question how much value there is in finding and fixing these if the compiler can never generate them.