lowRISC / opentitan

OpenTitan: Open source silicon root of trust
https://www.opentitan.org
Apache License 2.0
2.42k stars 723 forks source link

Parse SystemVerilog with a proper parser #1720

Open imphil opened 4 years ago

imphil commented 4 years ago

We have some places in our code where we hackishly/partially parse (parts of) SystemVerilog. It would be nice to choose an existing parser and integrate it properly into the codebase.

Currently we partially parse SV in these places:

primgen.py also contains a list of parsers which have already been tested in the past and weren't up for the job at this point, but could still be a good starting point to extend/contribute to.

GregAC commented 4 years ago

The surelog readme contains a list of verilog (and vhdl) parsers:

https://github.com/alainmarcel/Surelog/blob/master/README.md

and of course there's surelog itself.

Depends on whether we want a parser in python or simply one that can be easily integrated with python?

imphil commented 4 years ago

Ideally we have something which is written in Python and can be installed through pip, like our other Python dependencies. Everything else is trickier from a distribution perspective, even if it's just a Perl script (e.g. using Verilog-Perl). But it's a trade-off, of course.

msfschaffner commented 4 years ago

looping in @fabianschuiki from ETHZ, he may have some pointers here (including his homegrown SV tools).

fabianschuiki commented 4 years ago

I'm not aware of any pure-Python implementations and as @imphil suggests, this would make distribution much easier. If you are willing to require compilation/installation, the following could be interesting.

Both of the above are rust projects, which makes them fairly convenient to distribute (cargo install moore, etc.), but harder to find people to contribute to. In the C++ world there's also:

The excellent sv-tests website and repo might also be a good starting place to go "shopping" for a parsing solution.

If you end up trying moore, feel free to shout at me and throw any broken code my way 😃.

eunchan commented 4 years ago

I remember I had a discussion about this with Michael a long ago. :) At the time, I couldn't find a good SV parser written in Python but a few in Haskell and other languages as mentioned above.

tgorochowik commented 3 years ago

My understanding (after a discussion with @imphil ) is that the main purpose of this is finding all the modules and their ports.

Verible does not have any python integration but as it is already used in the project for other purposes it could be a good candidate.

This goal should be fairly easy to achieve with verible as it has proper helpers so you don't have to operate on the AST directly. For example the Matcher mechanism or the TreeSearch mechanism could be used to find all the definitions of modules (similarly to how various components are matched in linter rules), e.g. https://github.com/google/verible/blob/09e0b877c9cb7694a5c8b080b1c8f82addc2f901/verilog/CST/module.cc#L35-L38 And then, finding all port names for a given module is just another simple call: https://github.com/google/verible/blob/09e0b877c9cb7694a5c8b080b1c8f82addc2f901/verilog/CST/port.cc#L35-L38

I think that if such a util would be useful for you, it could even be added to verible as an extra util (or as a switch to one of the existing utils verible provides).

@fangism please feel free to weigh in.

imphil commented 3 years ago

main purpose of this is finding all the modules and their ports.

Actually, slightly narrower than that: Given a single SystemVerilog-2017 file with a single module in it, I want to extract:

The current regex-based code for that is at https://github.com/lowRISC/opentitan/blob/master/hw/ip/prim/util/primgen.py#L87

Since this parser is part of our mandatory build requirement and every user/developer of OT needs to have access to it, we are very careful about new binary requirements. (It's not trivial to distribute binaries, or convince every user of OT to compile another tool for their potentially ancient RedHat distribution.) Ideally, we can either rely on existing requirements (e.g. Verible or Verilator), or have a Python parser.

fangism commented 3 years ago

My understanding (after a discussion with @imphil ) is that the main purpose of this is finding all the modules and their ports.

Verible does not have any python integration but as it is already used in the project for other purposes it could be a good candidate.

This goal should be fairly easy to achieve with verible as it has proper helpers so you don't have to operate on the AST directly. For example the Matcher mechanism or the TreeSearch mechanism could be used to find all the definitions of modules (similarly to how various components are matched in linter rules), e.g. https://github.com/google/verible/blob/09e0b877c9cb7694a5c8b080b1c8f82addc2f901/verilog/CST/module.cc#L35-L38 And then, finding all port names for a given module is just another simple call: https://github.com/google/verible/blob/09e0b877c9cb7694a5c8b080b1c8f82addc2f901/verilog/CST/port.cc#L35-L38

I think that if such a util would be useful for you, it could even be added to verible as an extra util (or as a switch to one of the existing utils verible provides).

@fangism please feel free to weigh in.

See this thread https://github.com/google/verible/issues/230, where a user prototyped a Python parser to ingest the (intended for human-reading) tree dump of verible-verilog-syntax --printtree. (If you want tokens including comments, --printtokens or --printrawtokens`)

And in a more ideal world, the CST representation should be portable to many other languages, along with the functions that hide the details of substructure (see also https://github.com/google/verible/issues/159). Those functions should come from a language-agnostic description of CST structures, and be generated to target other languages, but we're not there yet.

There is interest in this area, from different classes of users, including those who intend to extract and transform code. More than half of verbal requests I hear are related to operations on modules. I would be supportive of efforts to export structured views of source code as seen by Verible to other languages.

I encourage further discussion on this subject on our developer's mailing list verible-dev@googlegroups.com.