oracle / opengrok

OpenGrok is a fast and usable source code search and cross reference engine, written in Java
http://oracle.github.io/opengrok/
Other
4.29k stars 739 forks source link

Improve verilog analyzer #2892

Open wvandamm opened 4 years ago

wvandamm commented 4 years ago

Hi,

I'm looking to improve the verilog / systemverilog analyzer. What I don't like about the current behavior is that for whatever reason the definition search generally comes out empty. E.g. when I click a class object name, rather than jumping to where the object was defines, the search comes out empty. Also, clicking on a definition itself results in an empty search (rather than searching for symbol occurrences). This breaks the flow of browsing by just clicking around somewhat. Symbol search however does find all the occurrences, including the place where the symbol is defined.

I tried figuring out what exactly indicates that a specific occurrence is the definition or what is linking this with the opengrok definition search, but that bit is not fully clear to me. I would expect that the matches from ctags regular expressions indicate what are the definitions, but I can't seem to find how this links into how opengrok searches for definitions and symbols.

Could anybody please provide a pointer on where I should look to improve this?

Thanks!

Wim

idodeclare commented 4 years ago

Hi, @wvandamm,

The language objects as parsed by universal-ctags are what constitutes Definitions in OpenGrok and what get Lucene-indexed as defs. Try running universal-ctags on a sample file of yours, and review the tags file to see what objects are found.

Looking in opengrok/opengrok-indexer/src/test/resources/analysis/verilog as an example, you'll find sampletags from sample.v. In sample_xref.html, you'll see the result of the symbols parsed from sample.v (by VerilogLexer) that are cross-referenced with Definitions parsed from sampletags.

Some symbols are determined to be the definitions of those symbols (data-definition-place="def") and are hotlinked to refs searches (e.g. href="/source/s?refs=SCARV_COP_INSN_ABORT").

Some symbols are determined to be references to definitions in the same file (data-definition-place="defined-in-file") and are hotlinked to the definition anchors (e.g. href="#crd_addr").

Some symbols do not cross-reference to anything in sampletags (data-definition-place="undefined-in-file") and are hotlinked to definitions searches (e.g. href="/source/s?defs=cprs_snoop").

wvandamm commented 4 years ago

Hi @idodeclare ,

Thanks for the feedback! Meanwhile I analyzed the resulting tags for a sample program and I found that indeed the definitions I was looking for are absent in the resulting tags file. After searching some more on the ctags github I found related a pull request:

https://github.com/universal-ctags/ctags/pull/1495

This PR works with multiple passes where in a first pass typedefs and classes are identified, and in a second pass definitions of variables or objects using those types or classes can be identified. That would solve what I was looking for. I raised the question in the ctags github as to what the status is of the PR since it dates back from 2017.

Coming from a C++ environment it's strange that this kind of functionality does work there but not for SystemVerilog. I guess that means some local solution was implemented for C/C++. I raised this question as well on the ctags github.

Regards,

Wim

idodeclare commented 4 years ago

@wvandamm ,

Thank you for leading to that interesting discussion in u-ctags. OpenGrok currently only has use for a "single-file" run and not a "multi-source-file" run which that pull request above is discussing.

In perusing some of the other linked discussion, I see masatake mentioned that "multi-pass, single-file" is already supported ("If the target is a single input file, ctags has facilities to run multi-pass parsing ... [as] introduced for objc parser."), but I read as well that enhancements for the "MS" scenario (Multi-pass/Single-file) are underway too.

unhipzippo commented 4 years ago

I find it interesting that you're getting no definition hits for your Verilog / SystemVerilog tokens -- I'm having just the opposite problem. :)

One typedef token, for example, returns 7 pages of "port"-type definition hits (e.g. "input (port)", before finally getting to the "typedef " that our users are really looking for.

I was chalking this up to the fact that Ctags lumps together multiple "kinds" into a definition search (https://github.com/universal-ctags/ctags/blob/master/parsers/verilog.c#L108), and without the multi-pass parsing, many "port"-type references are getting incorrectly picked up as definitions. Hearing that your definition search aren't returning any results, though, makes me wonder.

(I expect that something like https://github.com/oracle/opengrok/issues/685 might at least allow filtering of the overabundance of results I'm seeing -- specifying I only want to see typedefs would be ideal)

masatake commented 3 years ago

Universal-project found a great maintainer for Verilog/SystemVerilog parser now. The person has been the parsers. I recommend you to try the latest version of u-ctags.