verilator / verilator

Verilator open-source SystemVerilog simulator and lint system
https://verilator.org
GNU Lesser General Public License v3.0
2.34k stars 566 forks source link

Question: Adding foreign module support to Verilator #1943

Closed veripoolbot closed 6 years ago

veripoolbot commented 6 years ago

Author Name: Xavier Delacour Original Redmine Message: 2348 from https://www.veripool.org


The patch [1] is a first pass at implementing "foreign modules", based on the notes at [2]. E.g., run it like [3]-- module C2 [4] causes generation of obj_dir/foreign_C2.v [5], which is automatically included in C1, etc. --trace output works.

What is there is mostly working, though some issues pop up around scheduling of clocked signals. In the current implementation, an AstForeignRead within a clocked block will behave like a delayed assign (generating ASSIGNPRE FOREIGNREAD ASSIGNPOST etc), but in some cases that leads to an extra clock inserted when traversing up the foreign module hierarchy and back down another branch. It feels as though knowledge of ASSIGNPRE/ASSIGNPOST wants to be exported more fully to the foreign interface.

Please take a look and try it out-- I can elaborate more on what isn't clear if anything. Any thoughts on how pre/post scheduling should work in particular would be appreciated. Thanks!

Xavier

[1] https://github.com/xavier98/verilator-foreign/commit/9c509f6d381af7e0ded09a6890e733e8c66a8fc4

[2] https://www.veripool.org/boards/2/topics/412-Verilator-verilating-compiling-modules-into-separate-object-files

[3]

cd test_v/foreign
../../verilator_bin --cc C2.v
../../verilator_bin --cc C1.v
</code>

[4]

module C2
  (input clk,
    input din,
    output dout
    );
    // verilator foreign_module
    reg   doutr;
    assign dout = doutr;
    always @(posedge clk)
      doutr <= din;
endmodule
</code>

[5]

// verilator tracing_off
// verilator lint_off UNOPTFLAT
module foreign_C2 (
         input logic clk,
         input logic din,
         output logic dout);
     // verilator inline_module
     // verilator foreign_interface C2
     // verilator foreign_write clk
     always @(posedge clk) begin
         // verilator foreign_eval _sequent__TOP__1
         // verilator foreign_write din
         // verilator foreign_read_post dout
     end
     always @(foreign_settle) begin
         // verilator foreign_eval _settle__TOP__2
         // verilator foreign_read_post dout
     end
endmodule

</code>
veripoolbot commented 6 years ago

Original Redmine Comment Author Name: Wilson Snyder (@wsnyder) Original Date: 2017-09-20T02:43:15Z


http://www.veripool.org/boards/3/topics/2348-Verilator-Adding-foreign-module-support-to-Verilator

It's quite impressive what you've added here. I'm running out of 3.9 version numbers, this might be a great excuse to finally go to 4.000.

Your change to .gitignore was fine and separable, so I committed it to trunk.

First off:

Then:

To reduce the size of the patch and get what we can into trunk more easily can some things be separated out first into a separate patch with minimal disruption, i.e. making V3EmitV into a inheritable class?

I made a quick pass at the patch, here's some major comments:


bin/verilator:

I think a new =head1 is probably appropriate describing how to use this.

+    --gen-foreign-interface     Generate foreign interface file(s) for top module(s)

Seems long, maybe just "--gen-foreign"?

+=item --gen-foreign-interface
+
+Declare all top modules as foreign_module, and generate foreign interface
+files for them. Instead of using this option, /*verilator foreign_module*/
+can be placed in modules a foreign interface should be generated for.

"Declare all top modules as foreign_module, and generate foreign interface
 files for them. Alternatively, /*verilator foreign_module*/ can be placed
 in modules a foreign interface should be generated for.

Add /*verilator foreign_module*/ to the LANGUAGE EXTENSIONS section.
Likewise note all of the other new metacomments.

+++ a/include/verilated.cpp b/include/verilated.cpp

+const char* Verilated::s_foreignScope = NULL;
+static vector<const char*> g_foreignScope;
..
+# define VL_DEBUG_POP_FOREIGN_SCOPE() {if (VL_UNLIKELY(Verilated::debug())) {Verilated::popForeignScope();}}

Is this just to track what foreign module is active, or for more?  But
anyhow statics don't go here but instead inside VerilatedImp::s_s
singleton.

Also I'd expect them to be always in the sources but wrapped in a VL_DEBUG_IF statement, similar to the print statements.
+++ a/include/verilated_vcd_c.cpp b/include/verilated_vcd_c.cpp
+    string m_scope_concat;              ///< Concat form of inter-module m_scope

Why are these VCD changes needed?  The trace routines were intended to work
OK with multiple modules active.  When a model is created (new'ed) a scope
can be prefixed, so I'd expect a foreign module when created to have it's
new() have the proper scope prefix.

+++ a/src/V3AstNodes.h
 class AstPragma : public AstNode {
 private:
      AstPragmaType      m_pragType;     // Type of pragma
+    string m_arg;

It's probably better to instead push an AstText as a child. See
e.g. attr_event_control (which pushes a special node type, but you get the
idea).

+class AstForeignEval : public AstNode {
+    string m_name;
+    AstForeignInstance* m_fi;
+public:
+    AstForeignEval(FileLine* fl, const string& name)
+       : AstNode(fl), m_fi(0), m_name(name) {

0 is not the same as NULL. You want NULL. Check all your 0's.

Any pointers in the Ast tree need broken() check methods.
Also clone() needs to fix up the pointers.

+++ a/src/V3EmitFI.cpp

This is obviously the heart and I need to review it closer.

Please split EmitFI into two strong separate source files - one that does
all of the appropriate manipulations to the objects creating appropriate
nodes, then another that changes nothing in the AST structure and just
dumps appropriate files.  (This makes it a lot easier to debug and
maintain. V3EmitC cheats in some cases, which are basically legacy and I
now avoid.)

+++ a/src/V3EmitV.cpp

If you can do what you want and leave the visitors inside the member class, it would
be appreciated. e.g. see V3EmitCBase.h, probably you want to make a V3EmitVBase.h.

+++ a/src/V3Hashed.cpp
      void nodeHashIterate(AstNode* nodep) {
         if (!nodep->user4()) {
-           if (nodep->backp()->castCFunc()
-               && !(nodep->castNodeStmt() || nodep->castCFunc())) {
-               nodep->v3fatalSrc("Node "<<nodep->prettyTypeName()<<" in statement position but not marked stmt (node under function)");
-           }
+           // * fix this; to allow AstText under cfunc or equiv
+           // if (nodep->backp()->castCFunc()
+           //  && !(nodep->castNodeStmt() || nodep->castCFunc())) {
+           //  nodep->v3fatalSrc("Node "<<nodep->prettyTypeName()<<" in statement position but not marked stmt (node under function)");
+           // }

This needs cleanup - just change "fix this" to "FIXME" so the regression tests will see that comment and won't let you forget.

+++ a/src/V3LinkCells.cpp
+       // if the module is declared with foreign_module, resolve foreign_NAME in its place.
+       // i.e., if you include module NAME, transparently use the previously-generated
+       // foreign_interface file
+       bool is_foreign_module = false;
+       for (AstNode* stmtp=modp->stmtsp();stmtp;stmtp=stmtp->nextp()) {
+           AstPragma* pragma = stmtp->castPragma();
+           if (!pragma || pragma->pragType() != AstPragmaType::FOREIGN_MODULE)
+               continue;
+           is_foreign_module = true;
+       }
+       if (is_foreign_module) {
+           underlying_modp = modp;
+           AstNodeModule* underlying_modp;
+           modp = resolveModule(nodep, (string)"foreign_"+modName, underlying_modp);

I think we need a global --foreign switch to turn on this, and anything
else that alters normal behavior, otherwise if someone has a module named
foreign_something today, it could break.

+++ a/src/V3Order.cpp

Likewise comment as in V3Gate - shouldn't be matching names etc.

+  "foreign_settle"     { FL; return yFOREIGN_SETTLE; }

We shouldn't declare new Verilog keywords as it could theoretically
conflict with others.  `verilator_foreign_settle?  (Other extensions are
all macro-like looking, as another compiler could define them away.)

+  "/*verilator foreign_module*/"       { FL; return yVL_FOREIGN_MODULE; }
+  "/*verilator foreign_interface"[^*]*"*/"   { FL; yylval.strp = PARSEP->newString(PARSEP->verilatorCmtSymbol(yytext)); return yVL_FOREIGN_INTERFACE$
+  "/*verilator foreign_eval"[^*]*"*/"   { FL; yylval.strp = PARSEP->newString(PARSEP->verilatorCmtSymbol(yytext)); return yVL_FOREIGN_EVAL; }
+  "/*verilator foreign_depend"[^*]*"*/"  { FL; yylval.strp = PARSEP->newString(PARSEP->verilatorCmtSymbol(yytext)); return yVL_FOREIGN_DEPEND; }
+  "/*verilator foreign_write"[^*]*"*/"  { FL; yylval.strp = PARSEP->newString(PARSEP->verilatorCmtSymbol(yytext)); return yVL_FOREIGN_WRITE; }
+  "/*verilator foreign_read_post"[^*]*"*/"   { FL; yylval.strp = PARSEP->newString(PARSEP->verilatorCmtSymbol(yytext)); return yVL_FOREIGN_READ_POST$
+  "/*verilator foreign_read"[^*]*"*/"   { FL; yylval.strp = PARSEP->newString(PARSEP->verilatorCmtSymbol(yytext)); return yVL_FOREIGN_READ; }

If it's useful there's also the VLT control language (which probably
didn't exist when the original thread was written), which might offer more flexibility as the grammar can be arbitrary.

  `verilator_config
     whatever needed syntax you invent
  `verilog

+++ a/src/verilog.y

Please fix indentation to match other rules.
veripoolbot commented 6 years ago

Original Redmine Comment Author Name: Wilson Snyder (@wsnyder) Original Date: 2017-09-20T02:44:22Z


To reduce the size of the patch and get what we can into trunk more easily can some things be separated out first into a separate patch with minimal disruption, i.e. making V3EmitV into a inheritable class?

P.S. if so please make a separate patch and file a bug on each major separable patch.

veripoolbot commented 6 years ago

Original Redmine Comment Author Name: Xavier Delacour Original Date: 2017-09-20T15:26:34Z


Thanks for the feedback, Wilson. Will take a look at fixing the issues you mentioned and submit separable patches.

I think the AstForeignRead approach that is there now is correct (foreign_read in a clocked block is handled like delayed assign in outer module), and what I'm looking at is actually a tracing issue. The state of the input pins on the inner module is pre-clock at the inner eval (when fed by an AstAssignPost in another foreign module), unless there is combinational feedback to an output port. So the logic is correct but at end of outermost eval, the inner module input pin will be in its pre-clock state, whereas trace would like the post-clock value. Marking the input pin as an unconditional AstForeignWrite fixes that, by setting the value at bottom of outermost eval. Anyway will add some test cases for this.

Also, when is NULL not 0? In C++ gcc/msvc/clang, it is given via "#define NULL 0", which I believe is per the standard. C++11 introduces nullptr_t which removes the int vs pointer ambiguity, but I don't believe NULL will ever be defined to it for legacy reasons. Common C++ style prefers 0 over NULL, including e.g., STL implementations among many others. Anyway, no problem to prefer NULL-- I tried to match your style but missed a few spots.

veripoolbot commented 6 years ago

Original Redmine Comment Author Name: Wilson Snyder (@wsnyder) Original Date: 2017-09-21T11:23:25Z


Also, when is NULL not 0? In C++ gcc/msvc/clang, it is given via "#define NULL 0"

I didn't mean from a language standpoint but from a style standpoint, see e.g

https://stackoverflow.com/questions/176989/do-you-use-null-or-0-zero-for-pointers-in-c

I agree with the second post, and especially when C++11 is common can then simply replace NULL with nullptr. Of course most of the time rather than "if (foo==NULL)" it should be just "if (foo)". (It's tempting to write nullptr and have a "define nullptr NULL" when < C++11, but I suspect something will break.)

BTW I also didn't mean to push you into making a patch before you want to. It just looked like some pieces like the EmitV abstraction were close, and obviously getting some things upstream reduces later merge maintenance. Whenever.

I was hoping you could further elaborate on your ideas for how foreign read/write interact with clocking. For example, without any special care, if two inner modules clock a signal (e.g., input din; output dout; reg doutr; doutr<=din; assign dout=doutr;) and an outer module daisy chains these together (see test_v/foreign/H*.v for this example), the generated inner _sequent__TOP's will have collapsed the clock already -- ASSIGNPRE / ASSIGNPOST will be optimized away, leaving essentially "dout = din;" as the implementation in the two inner modules.

If the outer module simply runs the _sequentTOP of the inner modules, the second inner module's assignment won't be a delayed assignment anymore. That is fixed by having the AstForeignRead on the outer module behave like a delayed assignment, such that the dout of the first block is written to Vdly_dout1 in the outer module, and the inner module sees the old dout1 value. The end of the outer implementation performs the ASSIGNPOST dout1=__Vdlt_dout1 which "commits" the clock.

That however fails if you have a wire assign that goes up the module hierarchy (outer module foreign_reads a value from an inner module), and then back down another branch (outer module foreign_writes to a second inner module). In that case, the path from inner module 1 to outer module to inner module 2 will behave like a clocked signal, and the wire between the three modules will behave like a clocked signal (and be 1 cycle late etc).

It seems the proper solution is similar to how non-inlined modules work-- the ASSIGNPRE/ASSIGNPOST remain in their respective modules and get optimized away as needed, but V3Order of the outer module runs with complete knowledge of them for the whole model. In the first example above (H*.v), that would cause the second inner module to run its "dout=din" first, after which the first inner module would run its "dout=din", and give the correct clocking behavior. In the second example, the outer foreign_read wouldn't be delayed so the wire assign wouldn't be delayed.

The question becomes how to define them in the foreign interfaces, given that in most cases, the ASSIGNPRE/ASSIGNPOST are related to internal signals and not inputs/outputs of the module. Also when you are looking at the foreign interface, the optimization and emit of the inner modules has already occurred-- i.e., leaving in all the ASSIGNPRE/ASSIGNPOST logic when verilating the inner modules would leave in a lot of logic that would otherwise be removed if you had knowledge of outer modules.

Right, seems a correct problem assessment. BTW see e.g. SystemC which has a similar problem and uses the heavyweight solution of having a "pre" and "post" value for every signal including those passed between SC_MODULEs.

Yes, I think you need to split pre and post, however I believe that if V3Order sets things up right it should be able to keep the internal logic isolated and optimize most away - it sort of does that now by looking for where inputs come in in the later stages.

Also I think V3Order needs to be sure to keep combo functions separate that deal with different primary inputs so that "loop" through the child can be better optimized.

Then when generating the foreign, it needs to list all of the v3 order cross dependencies

For:

module H3
  (input clk,
    input din,
    input inloop,
    output outloop,
    output reg intnl_s3,
    output reg doutr
    );
    // verilator foreign_module
    always @(posedge clk) doutr <= din;
    assign outloop = doutr | inloop;
    reg    intnl_s1;
    reg    intnl_s2;
    always @(posedge clk) begin
       intnl_s1 <= din ^ outloop;
       intnl_s2 <= intnl_s1;
       intnl_s3 <= intnl_s2;
    end
endmodule

Perhaps something like (handcrafted as example):

module foreign_H3 (
         input logic clk,
         input logic din,
         input logic inloop,
         output logic outloop,
         output logic intnl_s3,
         output logic doutr);
     // verilator inline_module
     // verilator foreign_interface H3
     // verilator foreign_write clk
     always @(posedge_pre clk) begin  // note pre
         // verilator foreign_eval _sequent__TOP__1a
         // verilator foreign_write din
     end
     always @(posedge_post clk) begin  // note post
         // verilator foreign_eval _sequent__TOP__1b
         // verilator foreign_depend _sequent__TOP__1a
         // verilator foreign_depend _combo__TOP__2  // if e.g. this has inputs from combo__TOP_2
         // verilator foreign_read_post intnl_s3
         // verilator foreign_read_post doutr
     end
     always @(*) begin
         // verilator foreign_eval _combo__TOP__2
         // verilator foreign_depend _sequent__TOP__1a
         // verilator foreign_write inloop
         // verilator foreign_read_post outloop
     end
     always @(foreign_settle) begin
         // verilator foreign_eval _settle__TOP__3
         // verilator foreign_write inloop
         // verilator foreign_read_post outloop
     end
     always @(foreign_edge_only) begin
         // verilator foreign_eval _sequent__TOP__4  // Does the internal only pre+post in one step
     end
endmodule

As mentioned before we should debate if it's better to use config. Metacomments are good when another tool needs to ignore them, but that doesn't seem the case here as other than the header it won't help another tool. For example this has the ability to let us comment out parts when debugging as needed.

module foreign_H3 (
         input logic clk,
         input logic din,
         input logic inloop,
         output logic outloop,
         output logic intnl_s3,
         output logic doutr);

`ifdef VERILATOR
     // verilator inline_module
     // verilator foreign_interface H3     // An attribute on the module, so meta comment seems rignt
 `verilator_config
     foreign_write clk;   // Not sure why this was there
     foreign_cfunc sequent;
         foreign_eval _sequent__TOP__1a;
         foreign_write din;
     end
     foreign_cfunc post;
         foreign_active posedge clk;
         foreign_eval _sequent__TOP__1b;
         foreign_depend _sequent__TOP__1a;
         foreign_depend _combo__TOP__2;  // if e.g. this has inputs from combo__TOP_2
         foreign_read_post intnl_s3;
         foreign_read_post doutr;
     end
     //...
 `verilog
`endif
endmodule
veripoolbot commented 6 years ago

Original Redmine Comment Author Name: Xavier Delacour Original Date: 2017-09-22T16:56:38Z


That makes sense. I like the idea to pre-split the combo path so more loops can be flattened across boundaries, and explicit pre/post. It seems there are a couple trade-offs there so probably user option/control when building the inner module will be useful. The main problem with the current approach with foreign_depend is that it is somewhat easy for unoptflat's to be generated that will require multiple outer evals when there are inter-module loops.

The verilator_config change is fine with me. When I have a bit more time and get the fixed up patches together, will see about pushing that into it.

An interesting evolution of Verilator might be some form of JIT + event based model. Instead of static code generation, manually evaluate the simulation by walking the pre-order tree, and for specific hot paths generate LLVM IR as you go. All these inter-module issues go away, since you'd work from pre-built AST blocks per module (linked earlier or at runtime). It's not hard to imagine performance being at least what it is today, but with other benefits like control over what is generated/icache heavy vs iterated, and bypassing evaluation of combo paths that aren't changing, which seems to be a large fraction of where the time goes now. Julia and recent Javascript engines use a similar model to good effect.

veripoolbot commented 6 years ago

Original Redmine Comment Author Name: Wilson Snyder (@wsnyder) Original Date: 2017-09-22T22:40:54Z


I've thought of a jit before. Besides effort, the big problem is to get good performance you really need to schedule the whole model holistically. JIT would help the initial startup time while the model builds in the background, which would be nice.

veripoolbot commented 4 years ago

Original Redmine Comment Author Name: Conor McCullough Original Date: 2019-10-22T21:00:36Z


Has any further progress been made on this? I think it would be a very useful feature!

veripoolbot commented 4 years ago

Original Redmine Comment Author Name: Xavier Delacour Original Date: 2019-10-23T17:16:36Z


I haven’t had the time to wrap this up, though I still have a need for it and would like to see it happen. Basically my use case is to do incremental builds— when working with large projects verilator is pretty painful to use right now; being able to separately compile a handful of input files that are actively being worked on can reduce dev cycle time from many minutes to a few seconds. There are some other interesting use cases as well, like closed source distribution, compiling many separated verilog unit tests against the same large model, etc.

veripoolbot commented 4 years ago

Original Redmine Comment Author Name: Todd Strader (@toddstrader) Original Date: 2019-10-24T11:53:31Z


We've recently added --protect-lib for embedding protected IP (instead of using encrypted RTL): https://www.veripool.org/issues/1490-Verilator-Add-an-option-to-create-a-DPI-protected-library

On some level, this feature allows you to have incremental compilation today. With that said, it still needs a fair bit of polishing. Also, it doesn't support tracing the embedded modules (since the original goal was to hide things). However, that can be added. Please see the related issues to get a feel for where things are at, but if you're going to try this out you should be aware that:

I've opened an issue to track extending --protect-lib for this use case: https://www.veripool.org/issues/1572-Verilator-Extend-protect-lib-for-foreign-embedded-module-use

So I'm aware what's being considered, would either of you be able to share some rough metrics on your designs? Namely:

veripoolbot commented 4 years ago

Original Redmine Comment Author Name: Todd Strader (@toddstrader) Original Date: 2019-10-24T12:18:18Z


Also, please note that --protect-lib (just like normal Verilator compiles) requires that your embedded module's parameters be fixed. This is being tracked here: https://www.veripool.org/issues/1522-Verilator-Support-mutable-top-level-parameters-for-protect-lib

This is more of an issue for the third-party IP use case since the entire reason they would ship you a module with parameters is because they don't know what permutations you will pick. In the incremental compilation case, you can at least know/discover what parameter permutations you have. But you'd need to build a library for each one.

veripoolbot commented 4 years ago

Original Redmine Comment Author Name: Xavier Delacour Original Date: 2019-10-24T18:41:36Z


Thanks! Will check that out. I definitely need trace (often that is the entire point), and at least in some cases combinational loops through foreign blocks.

On small variants of the design I have I am looking at 50s for verilator to run, and another 2.25m for C++ compile across 64 cores (though spread a bit unevenly; closer to 20s to complete most of the files). A mid-range variant takes 8.25m for verilator to run (while consuming up to 60GB of memory!) and another 10+ minutes for C++ compile. In practice however we work on 1-4 modules at once, which themselves take only 1-10s for verilator to build as top, and a couple seconds of C++ compilation. Incremental compilation allows (sanely) working directly at larger variants, even if at some runtime cost.