[top] top_earlgrey_* organization

tjaychen commented 5 years ago

Currently, there are 3 overall topearlgrey* organizations top_earlgrey_asic top_earlgrey_verilator top_earlgrey_nexysvideo

Each basically follows the structure of ... topearlgrey* -> top specific pin/pad logic -> top specific reset / clock logic -> top_earlgrey

Instead of maintaining 3 separate top wrappers, we'd like to suggest having only one functional top in the following organization top_earlgrey -> top_earlgrey_core (today's top_earlgrey) -> pinmux / padring -> other specific top level logic

This will allow us to stop maintaining 3 different versions, and also concentrate testing.

To accomplish this, there are a few things that need to be done

dpi modules for top verilator need to be moved to a 'tb' level. So in essence, there will be top specific test benches where required, and effectively another wrapper
there needs to be another prim module that wraps the clocking logic to differentiate between fpga / dv and verilator
there needs to be a full fledged pinmux / pad module that can support the needs of all 3 - this is already under discussion and development

Please let us know what you think. One alternative approach is to no separate top_earlgrey and top_earlgrey_core, but instead make the top level template flexible enough to know when to generate what. That's something we should think about as well

@asb @imphil @vogelpi @eunchan @msfschaffner @sriyerg @aytong @martin-lueker @sjgitty

tjaychen commented 5 years ago

@gorusso @rpenugo

martin-lueker commented 5 years ago

This seems very reasonable. We have certainly made functional core designs, and then wrapped them with process/platform dependent padring files which hold all the pads, and process dependent guts. In fact this is often extended such that any process dependent cells get pulled to the top level (SRAMs, or mixed signal IPs). For example, in ASIC's SRAMs are not just process dependent, but often depend on the integrator, whereas in some FPGA's there may be different IP wrappers for a particular SRAM configuration that are dependent on the revision of the design tool. So for these reasons I agree, we should probably move to this type of file hierarchy, and furthermore, we should think about whether we want to propagate SRAMs and mixed-signal blocks to the pad-ring level as well. -Martin

aytong commented 5 years ago

The approach SGTM. My only comment is to ensure top-level DV TB DUT is top_earlgrey in the new structure, so connections between top_earlgrey_core to padring, etc. are covered.

eunchan commented 5 years ago

I am also OK with having top_earlgrey.sv and top_earlgrey_core.sv. For now, I will create PR that moving top_earlgrey.sv to _core and revise three modules to instantiating top_earlgrey_core. Then we can porting the necessary portions from _asic and _nexysvideo module to top_earlgrey.

imphil commented 5 years ago

The current structure allows you to have different toplevel ports for different targets, and have one place to instantiate target/device specific modules. If you remove that, you end up with

Ifdefs around toplevel ports, e.g. to support single-ended vs differential clocks, etc.
even more wrapping code prim-like modules.

Please see https://github.com/lowRISC/stwg-base/issues/89 for the full discussion that happened when we choose the current structure.

tjaychen commented 5 years ago

So regarding ports, part of why we are doing this is because we are putting in infrastructure to define the ports / pinmux / pads as part of top_earlgrey.hjson.

And yes, regarding other modules such as top level analog, clock generation, my opinion is that everything would need to be wrapped in a prim_* like architecture. The goal of this really is to focus the number of DUTs we verify across different targets to unify as much as we can. If in the end we still need a different top for a specific purpose, I think that can still be supported.

On Thu, Oct 3, 2019 at 10:23 AM Philipp Wagner notifications@github.com wrote:

The current structure allows you to have different toplevel ports for different targets, and have one place to instantiate target/device specific modules. If you remove that, you end up with

Ifdefs around toplevel ports, e.g. to support single-ended vs differential clocks, etc.

even more wrapping code prim-like modules.

Please see lowRISC/stwg-base#89 https://github.com/lowRISC/stwg-base/issues/89 for the full discussion that happened when we choose the current structure.

— You are receiving this because you were assigned. Reply to this email directly, view it on GitHub https://github.com/lowRISC/opentitan/issues/295?email_source=notifications&email_token=AAH2RSQNJ424COCV73U7L3DQMYTCDA5CNFSM4I4B5C4KYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEAI6DAQ#issuecomment-538042754, or mute the thread https://github.com/notifications/unsubscribe-auth/AAH2RST7TUXLR6LNDFD26MDQMYTCDANCNFSM4I4B5C4A .

eunchan commented 5 years ago

On Thu, Oct 03, 2019 at 10:23:47AM -0700, Philipp Wagner wrote:

The current structure allows you to have different toplevel ports for different targets, and have one place to instantiate target/device specific modules. If you remove that, you end up with

Ifdefs around toplevel ports, e.g. to support single-ended vs differential clocks, etc.

even more wrapping code prim-like modules.

Please see for the full discussion that happened when we choose the current structure.

A part of this issue is somewhat related to the PINMUX/ PADS stuff. Eventually OpenTitan needs analog pads/ or behavioral model pads inside top_earlgrey.
Combining ASIC and Nexys Video into the top module will help a lot to make them integrated. Verilator is somewhat special I think as it has DPI interface. But we could move the DPI to the top wrapper like tb in ASIC top test environment.

If not, we could end up having separate pad instances for asic and fpga. In my opinion, at least we can minimize the changes by including target specific verilog files outside of the top_earlgrey. ifdef could be minimal in that case.

msfschaffner commented 5 years ago

Concerning the technology dependent modules I second @tjaychen in that I also think they should be abstracted using a prim_* wrapper approach if possible.

However, @imphil is right that we have to be careful with IPs that have a different interface across different technologies. While the version of padmux / padctrl that we are currently working on can handle a parametric number of generic IOs, we may run into issues with clocking infrastructure IP (like the FPGA clock managers, differential clock signals), or blocks that rely on other pad functionality than just CMOS-style IO (like DDR memory controllers).

Let's revisit https://github.com/lowRISC/stwg-base/issues/89 and discuss this in more detail in our next meeting (or if it takes to long, schedule a separate meeting just for that).

imphil commented 5 years ago

A couple comments from my side:

I'm rather convinced we need a target-specific toplevel (currently named top_earlgrey_TARGET) if we don't want to have ifdefs in there, as it is impossible to find a common set of top-level ports for all targets. This toplevel can be generated, or hand-written, as it is today. The goal should be to have this target-specific toplevel as lightweight as possible, and be mostly a wrapper about the target-independent one (currently named top_earlgrey).
I don't really care if the target-independent toplevel is called top_earlgrey or top_earlgrey_core. We then however need to find a name for the target-specific ones (or name them all top_earlgrey and put them into different directories).
prim modules are designed to have a common interface across targets. So everything that could be wrapped in a prim module shouldn't go into the target-specific toplevel, but the target-independent one. In the past, we didn't have sufficiently flexible pinmux IP to have this functionality in the target-independent toplevel. That's changed now, so move pinmux to the target-independent toplevel.
Our Verilator toplevel is just the most extreme example of "different toplevel ports are needed for different targets".
For a large part of the toplevel DV work, the target-independent toplevel can be used. The target-dependent toplevel is likely to have analog IP or vendor primitives in there which are harder to simulate.

@msfschaffner We do have a silicon meeting on Tuesday. Can you talk with Scott to see if we can discuss this topic there. Otherwise, please find a date/time for a dedicated short meeting on this.

msfschaffner commented 5 years ago

Hey Philipp, thanks for your comments. I tend to agree with most of them, but as you say more discussion is needed such that everybody is on the same page here. I'll check with @sjgitty and @eunchan whether we have enough time in our meeting next week.

tjaychen commented 5 years ago

Thanks for the thoughts Philiipp / Michael.

Let me clarify some of my thoughts

I prefer strongly to re-use as much code as possible across the various platforms. So I tend to like to start very restrictive and see where those assumptions breakdown, I also tend to like to include chip connectivity as much as possible in that equation
- I agree that it will be difficult to find a common set of ports across all targets. However the differences feel very small to me right now (FPGA may have extra clock ports and different port names (USB), verilator has dpi modules that belong in a more DV like testbench construct), and I'd rather start with marking out those specific features than maintaining an entirely separate module.
- The separate module in my opinion just have a way of getting out of hand over time. Same could be said of ifdefs or other feature separation mechanisms, but if they are all in one place, it will at least be easier to notice that a 'slow death' is happening
- One wishlist item for me is re-using the DV environment on FPGA builds, and I just think there's a better chance of that happening if most of the code, inclusive of the top, is the same. (Sri can correct me if I'm wrong, but I think DV is always best run at the absolute top and not just the target independent portion)
The separation of top / core in my mind is not so much about what is target dependent or independent
- To me the line is more what makes a chip and what does not. By that I mean I would like for the 'top_core' to be a module that can be instantiated in a larger chip as a subsystem. While the top non-core should be the final wrapper that inserts all the pieces that make it a full chip.
- So to me this means the top non-core should include things like pads, MAY include top level analog (oscillators, internal regulators, charge pumps etc) but NOT* things like memories and analog blocks that can be functionally isolated (such as TRNG)
  - There is a separate discussion here on what we think is the right approach to absorb technology / vendor specific blocks such as TRNG, I have some of my own thoughts I've shared with Martin, but I would like to hear others as well.
To some extent, I think the results of this discussion are not SUPER critical, because I expect when partners come on board they will give recommendations that actually fit their tapeout needs (for example, they may want the primary DFT controller at a certain level, they may want to actually lump all analog components in one place, they may want to create their own top_* every single time), so it is in my opinion good to ensure we do not pick something that makes it difficult to pivot later. Re-naming top_earlgrey to top_earlgrey_core, and merging the tops where possible now (with perhaps a few parameters and defines to mark out the different top needs) feels like a decent middle-ground to me. If we discover that there are actually way more features to separate, it will be easy to move back also (although in that scenario, I think as you suggested, I'd vote for a templated generate flow for the different tops)

On Fri, Oct 4, 2019 at 8:35 AM msfschaffner notifications@github.com wrote:

Hey Philipp, thanks for your comments. I tend to agree with most of them, but as you say more discussion is needed such that everybody is on the same page here. I'll check with @sjgitty https://github.com/sjgitty and @eunchan https://github.com/eunchan whether we have enough time in our meeting next week.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/lowRISC/opentitan/issues/295?email_source=notifications&email_token=AAH2RSW6W7MUXWN4VE4YWY3QM5PFXA5CNFSM4I4B5C4KYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEAMBLYI#issuecomment-538449377, or mute the thread https://github.com/notifications/unsubscribe-auth/AAH2RSRPF24ZZT65AQZCZJ3QM5PFXANCNFSM4I4B5C4A .

vogelpi commented 5 years ago

Thanks @imphil for linking lowRISC/stwg-base#89. That is helpful to understand the current state.

I agree with @msfschaffner . We should avoid ifdefs and abstract technology-dependent stuff using a prim_* wrapper whenever possible. Further, I also think that this approach is not going to work some things that we currently have in the top_earlgrey_*.sv files. It's not just that switching to a different technology requires another clock infrastructure, even the same FPGA device but on a different board may require another clocking IP with a different interface. I don't see how we could reasonably abstract such differences with a prim_* wrapper and that's probably also why these blocks are on the very top-level.

However, I don't understand how having fewer top-levels eases testing (especially since we are not talking about reducing the number of targets but rather having a single top-level supporting all targets). I would rather say what we want is to reduce the differences between the existing top-levels. Looking for example at the Nexys and ASIC top-levels, the main differences are the clocking and the pad control. As outlined above, I don't think we can unify the clocking. But why don't we just move the pad control into top_earlgrey.sv? Especially since we are now about to add the padmux IP. Wouldn't that be totally sufficient?

tjaychen commented 5 years ago

@vogelpi it was actually on my todo list to wrap the clocking with a prim_clock* , and within that we would target different implementations. Is there a reason you think this would not work? Ultimately clocking generation comes down to input clock (if there is one), configuration signals, and output clocks doesn't it? To me the main difference is that in real silicon, we likely won't have an input clock, but everything else I imagine to be quite similar (the configuration of course can be different, but memory wrappers for example have traditionally had good way of dealing with this via generic vectors)

and yes, the padmux / pinmux IP will go a long way to solving this issue. @msfschaffner / @eunchan will be discussing this tomorrow, so it will be good to get more information then. As you say, the main differences are pad and clocks, and I felt that clocks could be unified with a prim_wrap + a single ifdef for whether there is an external clock (we may even ignore that for now since I'm making guesses about the final silicon).

Lastly, I still prefer a single top because I strongly prefer DV to be able to cover a large majority of code across all platforms. Fairly recently (maybe a month ago), I noticed a connection error in the FGPA pin top that caused GDB to not fully work. If we have a top level test in DV that exercised that path for example, it would have been easily caught. I've also personally run into a lot of cases where things worked in DV and just did not in FPGA, and they were a pain to track down - so i've more or less developed a blind desire to keep things consistent between the two when possible.

There will still be deviations to be sure, but like I mentioned before, I prefer to start very restrictive, and open up later if necessary.

It may be good for others to chime in also.

imphil commented 5 years ago

On prim_clock: I don't see a common interface we could standardize on. We have

0..2 inputs (internally generated, single-ended, differential)
something between no or a highly varied set of configuration parameters (just look at the different parameters passed to the Xilinx clocking primitives)
And the good news: always one output port

I also don't see much value in spending effort of having a primitive with a shared interface. The clock is always configured target/board specific, and needed in the top-level file. So instantiating the right module there, instead of somehow passing config information to a primitive and instantiating that, doesn't strike me as a significant win.

Regarding the FPGA pin case you mentioned: this was an example where we had different pinmux configurations between FPGA and other targets. If pinmux is able to handle this, there's no need to have this code in the FPGA-specific toplevel any more (and that's what we're aiming for). You'd still need to have DV to test this configuration, of course.

tjaychen commented 5 years ago

regarding output, it's possible to have more than 1 correct? At least OpenTitan as spec'd by scott is meant to have both a primary clock and a fixed clock. But you are right, this pat will always be the same.

Maybe I'm missing something, but when I look at the FPGA clocking, doesn't it just take an external clock and reset pin? https://github.com/lowRISC/opentitan/blob/master/hw/top_earlgrey/rtl/top_earlgrey_nexysvideo.sv#L78

I'm sure there are more configurations wrapped up underneath, but I don't really think it would be necessary to expose them at a wrapper level.

Yes I do agree there could be different clock inputs, but to me that delta is small enough that I don't think it warrants a separate top. Now if the story becomes our design is big enough it needs to be split into multiple FPGAs, or that the FPGA needs a very specific set of pins that are different from the normal top, then I agree, a separate top makes sense.

In our current scenario though, I really don't think there is much of a difference, and that's one of the reasons I would like to merge them. If they do become sufficiently different it is easy to split off again, and nothing we do now is meant to prevent that choice in the future.

sjgitty commented 5 years ago

Boy, this one is getting long, but I didn't want to open a new issue. Cut/Pasting my comments from #851 regarding confusing documentation around this area:

We need some more collateral on these distinctions, just not sure where. I think we need a) better terminology of our top-level netlists (issue #295); b) better definition of the difference between the autogen'd top and the platform top (or whatever we call it/them); c) documentation for the different platform tops. Perhaps we restrict the top_earlgrey/doc/index.md to just the autogen'd top and introduce a separate platform document, but things get messy.

eunchan commented 4 years ago

Before this issue moves forward, Can we get an agreement of top_earlgrey_verilator.sv to be moved to dv/ directory other than rtl/? I am pretty sure the top_earlgrey_verilator module is for verification/ validation not the design.

tjaychen commented 4 years ago

yes that seems like the right thing to do to me.

On Tue, Nov 12, 2019 at 5:20 PM Eunchan Kim notifications@github.com wrote:

Before this issue moves forward, Can we get an agreement of top_earlgrey_verilator.sv to be moved to dv/ directory other than rtl/? I am pretty sure the top_earlgrey_verilator module is for verification/ validation not the design.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/lowRISC/opentitan/issues/295?email_source=notifications&email_token=AAH2RSUV32XX5GLLUVYAORDQTNI5RA5CNFSM4I4B5C4KYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOED4Q5VY#issuecomment-553193175, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAH2RSWNDHOVE5WLDY2LRQDQTNI5RANCNFSM4I4B5C4A .

msfschaffner commented 4 years ago

yes, I am also in favor of moving it into the dv directory.

msfschaffner commented 4 years ago

Just adding a note that we should remember to implement the top level reorganization as discussed in December at some point.

This issue is related: #892

tjaychen commented 4 years ago

Hey Michael, I think either you or I drew up a diagram of what this should look like, we should probably attack to the bug as a reference.

On Fri, Jan 17, 2020, 18:11 Michael Schaffner notifications@github.com wrote:

Just adding a note that we should remember to implement the top level reorganization as discussed in December at some point.

This issue is related: #892 https://github.com/lowRISC/opentitan/issues/892

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/lowRISC/opentitan/issues/295?email_source=notifications&email_token=AAH2RSUJTB6UBDMQWTZEPIDQ6JQLPA5CNFSM4I4B5C4KYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEJJN6RA#issuecomment-575856452, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAH2RSXYOFWIEJGSN6CIOGDQ6JQLPANCNFSM4I4B5C4A .

msfschaffner commented 4 years ago

I think that was this one: Top-level Reorg (3)

sjgitty commented 4 years ago

@msfschaffner when you have a moment can you refresh this issue? Perhaps it is closable now, but we should at least refresh with the latest thinking.

silvestrst commented 4 years ago

Question regarding top_earlgrey_verilator has popped up on the SW meeting. As a part of DIF we have at least one "sanity" test per DIF, which is ran on verilator via CI. At the moment Pinmux and Pad Control on verilator doesn't not have the same functionality as on FPGA (please see the related issue: #2707 ), namely padring is not instantiated, which prevents us from writing a comprehensive test at this time.

How important it is for the MS4 I don't know, maybe @gkelly could clarify this for.

msfschaffner commented 4 years ago

Sorry @sjgitty, I completely missed this one.

The top-level organization has in the meantime evolved quite a bit, but in a different way as we've sketched out above. Below is an update on the current plan and status.

Current Plan

Below is a diagram from @tjaychen which reflects the current thinking.

OpenTitan Power _ Clock Chip Partition

First of all, the chip-level top will eventually only contain:

the padring, which only consists of wires and pad instances.
the analog sensor top (AST) block, which comprises all analog and clock generation resources. on ASIC, this will be a macro that is going to be added by the vendor. for simulation and FPGA synthesis there will be two open-source models available.

Other IPs that need bus access (like the pinmux, or the AST sensor control) will be located inside what we currently call top_earlgrey (this is named base_earlgrey in the diagram above - more on that later).

Since the pinmux and pinout requirements have become significantly more complex, the pinmux and padctrl will be merged and eventually become one templated module, where the pinout, muxing capability and pin mapping can be defined in an hjson configuration. Also, the jtag_mux (currently located in the chip-level top) will be merged into the pinmux as well, since we need to be able to select different TAPs based on strap pin and lifecycle selection signals.

As part of the pinmux templating, we may also choose to template parts of the target-specific top-levels in order to make sure that the pinmux/padring/pad connections are performed correctly. This also presents an opportunity to align and share common parts of the target-specific top-levels, and thereby solve one of the initial concerns that has been brought up further above in this thread. I.e., top_earlgrey_asic, top_earglrey_nexysvideo, etc could be templated as well and generated from one single source by passing them through topgen as well (however, we've not made a final decision on whether to do that just yet).

For the Verilator testbench, the idea is to align it with the FPGA and ASIC tops as well, although there are a couple of hurdles that need to be overcome in order to enable simulation of the AST model, which currently contains constructs that are not supported by Verilator. Also, AST generates multiple clocks with different frequencies and we need to make sure this can be handled in the Verilator case (for instance, by setting them all to the same base clock fed in from the testbench using `ifdef VERILATOR preprocessor tricks).

As indicated in the diagram above, we may rename top_earlgrey into base_earlgrey in order to make the distinction between the chip-level tops (top_earlgrey_asic etc.) and the functional top-level (base_earlgrey) clearer.

@tjaychen, @eunchan let me know if I missed something above.

Current Status

Most of the changes listed below are tasked for MS4.

top_earlgrey (#5221):

[x] move JTAG mux into pinmux
[x] merge padctrl and pinmux
[x] define new pinmux hjson configuration
[x] revise and template the pinmux design
[x] ~~rename top_earlgrey into base_earlgrey~~ we chose to go for a different naming scheme where top_earlgrey still remains top_earlgrey and instead the target specific chip levels are prefixed with chip. E.g. chip_earlgrey_asic.
[x] add strap sampling and JTAG qualification functionality (via life cycle) to the pinmux

top_earlgrey_asic (#5221):

[x] padring instantiated and connected
[x] AST wrapper instantiated and connected
[x] template top-level according to new pinmux (https://github.com/lowRISC/opentitan/issues/5221)

top_earlgrey_<fpga target> (#5221):

[x] padring instantiated and connected
[x] template top-level according to new pinmux (https://github.com/lowRISC/opentitan/issues/5221)
[x] implement synthesizable AST wrapper model
[x] AST wrapper instantiated and connected

top_earlgrey_verilator (#6103):

[ ] padring instantiated and connected
[x] implement verilatable AST wrapper model
[x] AST wrapper instantiated and connected
[ ] template top-level according to new pinmux

other:

[x] update the docs (#6102)

Related Issues

Another issue that is related to the top-level organization is the actual configuration of the OpenTitan SoC. Ideally, we should strive to have as few FPGA emulation platforms as possible, and all of them should have the same system configuration as the ASIC configuration in order to ease SW development and emulation efforts, and keep the required maintenance work within bounds.

That may however not be entirely possible, especially if we intend to keep support for the lower cost development boards with Vivado WebPack support alive (the SoC configuration planned for the ASIC implementation does not fit on the nexysvideo, nor other such low-cost boards).

Hence, we should start thinking about ways of supporting different system configurations - but that discussion merits its own thread (just wanted to bring this up here).

silvestrst commented 4 years ago

@silvestrst - for notifications on new comments

msfschaffner commented 3 years ago

FYI, this status tracker has been updated.

Several important updates RE toplevel organization, pinmux and templating have landed. There are a couple of opens, mostly revolving around synthesizable AST for FPGA and Verilator, and the Verilator TB reorganization.

tjaychen commented 3 years ago

latest status:

ast has been merged into fpga and verilator. This means fpga / asic are mostly similar from a functional standpoint, although the we still need to sort out the pinout situation on cw310.

Verilator still has not merged into chiplevel.sv.tpl, but at least functionally it is now closer to asic, and we should be able to run ast related functions (entropy, power down/up etc) without putting in verilator top level hacks.

tjaychen commented 2 years ago

most of the work here is done. The remaining work is tracked in #6103

lowRISC / opentitan