stnolting / neorv32

:desktop_computer: A small, customizable and extensible MCU-class 32-bit RISC-V soft-core CPU and microcontroller-like SoC written in platform-independent VHDL.
https://neorv32.org
BSD 3-Clause "New" or "Revised" License
1.61k stars 228 forks source link

Do we really need wrappers for the top entity? #128

Closed stnolting closed 3 years ago

stnolting commented 3 years ago

I have some doubts regarding the current concept of the processor wrapper, i.e. rtl/templates/processor and rtl/templates/system. Currently, the rtl/templates/processor folder mainly provides wrappers, which have a simplified interface (like only exposing the UART and some GPIOs) but always do some very specific CPU/processor configuration. It feels like these are "prebuilt processor configurations" - like the ATTINY 13A-PU is a prebuilt processor configuration of the AVR family.

I think it would be much cleaner to always instantiate the processor top entity neorv32_top in all example setups.

My proposal:

  1. Modify the processor's top entity so that all configuration options are disabled by default.
  2. Remove the rtl/templates/processor and rtl/templates/system folders and all wrappers inside.
  3. Provide a single folder (like rtl/wrappers) that only contains the following three wrappers:
    • The test setup (currently rtl/templates/processor/neorv32_ProcessorTop_Test.vhd) because this is used for the tutorial in the user guide.
    • The wrapper providing resolved interface signals (currently rtl/templates/processor/neorv32_ProcessorTop_stdlogic.vhd).
    • The AXI4-Lite wrapper (currently rtl/templates/system/neorv32_SystemTop_axi4lite.vhd). I know this is not a wrapper in the classical definitions, but this unit does not instantiate any further modules I think this is ok.

The processor top entity already provides default values for all incoming & outgoing signals. If we add 1.) then all instances of the processor only have to use the generics and signals that are actually required for the setup. For example, the instantiation of the processor in the "test setup" (rtl/templates/processor/neorv32_ProcessorTop_Test.vhd) could be simplified to look like this:

  neorv32_top_inst: neorv32_top
  generic map (
    -- General --
    CLOCK_FREQUENCY              => 100000000,   -- clock frequency of clk_i in Hz
    INT_BOOTLOADER_EN            => true,        -- boot configuration: true = boot explicit bootloader; false = boot from int/ext (I)MEM
    -- RISC-V CPU Extensions --
    CPU_EXTENSION_RISCV_C        => true,        -- implement compressed extension?
    CPU_EXTENSION_RISCV_M        => true,        -- implement muld/div extension?
    CPU_EXTENSION_RISCV_Zicsr    => true,        -- implement CSR system?
    -- Hardware Performance Monitors (HPM) --
    HPM_NUM_CNTS                 => 4,           -- number of implemented HPM counters (0..29)
    HPM_CNT_WIDTH                => 40,          -- total size of HPM counters (0..64)
    -- Internal Instruction memory --
    MEM_INT_IMEM_EN              => true,        -- implement processor-internal instruction memory
    MEM_INT_IMEM_SIZE            => 16*1024,     -- size of processor-internal instruction memory in bytes
    -- Internal Data memory --
    MEM_INT_DMEM_EN              => true,        -- implement processor-internal data memory
    MEM_INT_DMEM_SIZE            => 8*1024,      -- size of processor-internal data memory in bytes
    -- Processor peripherals --
    IO_GPIO_EN                   => true,        -- implement general purpose input/output port unit (GPIO)?
    IO_MTIME_EN                  => true,        -- implement machine system timer (MTIME)?
    IO_UART0_EN                  => true         -- implement primary universal asynchronous receiver/transmitter (UART0)?
  )
  port map (
    -- Global control --
    clk_i       => clk_i,                        -- global clock, rising edge
    rstn_i      => rstn_i,                       -- global reset, low-active, async
    -- GPIO (available if IO_GPIO_EN = true) --
    gpio_o      => gpio_out,                     -- parallel output
    -- primary UART0 (available if IO_UART0_EN = true) --
    uart0_txd_o => uart0_txd_o,                  -- UART0 send data
    uart0_rxd_i => uart0_rxd_i                   -- UART0 receive data
  );

I think this is much cleaner and easier to understand. I also see no real advantage for using the current templates/processor wrappers in the os-flow setups.

Of course we can add further wrappers. But only if these wrappers really provide a feature, that is not available when using "plain instantiation" of the processor (like different interface type or different interface protocols).

What do you think? :thinking:

umarcor commented 3 years ago

The processor top entity already provides default values for all incoming & outgoing signals. If we add 1.) then all instances of the processor only have to use the generics and signals that are actually required for the setup. For example, the instantiation of the processor in the "test setup" (rtl/templates/processor/neorv32_ProcessorTop_Test.vhd) could be simplified to look like this

Honestly, I did never see that capability being used. I always saw all ports being explicitly assigned in the port map, either using open (for the ones of mode out) or assigning a constant value (for the ones of mode in). Hence, I'm not sure about that being LRM compliant. Did you try that with multiple synthesis/simulation tools?

I think this is much cleaner and easier to understand. I also see no real advantage for using the current templates/processor wrappers in the os-flow setups.

Of course we can add further wrappers. But only if these wrappers really provide a feature, that is not available when using "plain instantiation" of the processor (like different interface type or different interface protocols).

What do you think? πŸ€”

If the capability you explained is part of the LRM and supported by all the toolchains we are currently using (GHDL+Yosys+nextpnr, Vivado, Quartus and Radiant), then I'm good. Yet, as said, I'm unsure about it.

Currently, the rtl/templates/processor folder mainly provides wrappers, which have a simplified interface (like only exposing the UART and some GPIOs) but always do some very specific CPU/processor configuration. It feels like these are "prebuilt processor configurations" - like the ATTINY 13A-PU is a prebuilt processor configuration of the AVR family.

The main purpose of Processor templates is precisely to hide interfaces. All users can use the top entity, as long as they are willing to understand all the ports and generics (which implies understand what each peripheral is about). In fact, almost any user who is going to use NEORV32 in a "real" design is expected to write their custom ProcessorTop. Conversely, the Processor templates provided in the repo allow newcomers to understand/learn the pieces of the CPU/SoC they want to focus on. They also provide them a reference about how to structure their SoC design. Therefore, although the ProcessorTop templates might be redundant for using them in the current BoardTop examples, they are useful for users to copy, paste and modify.

On the other hand, Processor templates reduce code duplication when implementing the same design on different boards. Without that intermediate abstraction, all BoardTop sources for the same design would need to repeat the configuration. They are meant for people to test new boards by copying other BoardTop sources and without going into the details of NEORV32 until they want to.

I agree that some templates feel as "prebuilt processor configurations". I think that is ok. It's the purpose of Minimal and MinimalBoot. Others don't have restrictions in that regard, because their purpose is not to showcase/understand boot configuration options. That distinction might be made explicit in a README or in the documentation, so that users understand what is the scope of each template. Something such as:

  1. Use the Minimal or MinimalBoot to test if the board is working.
  2. Use the Test or Demo for your board, if it exists. Help create it otherwise.
  3. Create your own ProcessorTop template, where you instantiate the NEORV32 top and apply your custom constraints. Then wrap that in one or multiple BoardTop entities for your target boards.
  4. Create your custom system hierarchy with one or multiple direct instantiations of NEORV32.

I believe we want to cover all four use cases, from less experienced to most expert.

The AXI4-Lite wrapper (currently rtl/templates/system/neorv32_SystemTop_axi4lite.vhd). I know this is not a wrapper in the classical definitions, but this unit does not instantiate any further modules I think this is ok.

It's not exactly about "further modules" but "further logic". ProcessorTop templates don't have any hardware requirements; all the signals defined in ProcessorTop templates have no meaning in hardware, they are all wires. Conversely, in the SystemTop_axi4lite, there are additional gates and muxes. That is, it is possible to replace a ProcessorTop with a direct instance of the NEORV32 top, but it is not possible to replace a SystemTop by the NEORV32 top, unless you copy additional logic. Therefore, Wishbone to AXI4-Lite Bridge is a module/component de facto, even though it's written in the same source without using an explicit block or entity.

Overall, I understand why you want to provide a single wrappers folder, but I believe it is an oversimplification. By having ProcessorTop and SystemTop templates we are acknowleding an inherent complexity in the design process. That is an statement: you, user, need to understand the difference between a board top (specific for some device and I/O), a design/system top (specific for the peripherals added to the NEORV32, either "internal" or "external") and a processor top (or a direct top instantiation). Then, we are all free to blur everything together in a single source. It's ok if you don't want to repeat port declarations during quick iterations for prototyping. All VHDL designers will understand that.

I believe we want to avoid using the term "wrapper" for naming sources/directories and maybe in the documentation too. That's because "wrapper" is not descriptive. In a hardware design with a mostly structural description, everything is a wrapper. As soon as something is instantiated somewhere, there is a wrapper by definition. I acknowledge that "templates" might neither be the best term. I think that "something used as a pattern/reference" falls into that term. Yet, in the context of software, people might expect some code-generation (since that is the usage of the term in golang, python, etc.). Therefore, I'm good with renaming and better explaining the purpose. In some sense, those are "board agnostic instantiation examples". So, they might fit in setups/examples/common, instead of rtl/templates/processor.

stnolting commented 3 years ago

Honestly, I did never see that capability being used. I always saw all ports being explicitly assigned in the port map, either using open (for the ones of mode out) or assigning a constant value (for the ones of mode in). Hence, I'm not sure about that being LRM compliant. Did you try that with multiple synthesis/simulation tools?

It works on Quartus and Radiant. I still have to test Xilinx, but I think that (and the general concept) should be ok: https://forums.xilinx.com/t5/Design-Methodologies-and/Default-values-of-input-and-output-in-VHDL-2008/td-p/855985

The main purpose of Processor templates is precisely to hide interfaces. All users can use the top entity, as long as they are willing to understand all the ports and generics (which implies understand what each peripheral is about). In fact, almost any user who is going to use NEORV32 in a "real" design is expected to write their custom ProcessorTop. Conversely, the Processor templates provided in the repo allow newcomers to understand/learn the pieces of the CPU/SoC they want to focus on. They also provide them a reference about how to structure their SoC design. Therefore, although the ProcessorTop templates might be redundant for using them in the current BoardTop examples, they are useful for users to copy, paste and modify.

I agree that for beginners the whole configuration thing might be overwhelming at first. We had that problem when setting up the FOMU tests. But I think templates like "minimal" and "minimal_boot" make things even more complicated to understand. From m point of view, the best solution would be to provide two test setups:

  1. UART + bootloader + LEDs; this is the one we already have; in the user guide this is used to illustrate the bootloader
  2. LEDs + fixed program in IMEM; we "kind of" have this as the default IMEM image is a blinky LED example; we just need a packaged top entity (so a new test setup) for that

Both setups need explicit names and a good explanation in the user guide. If newcomers work through these two setups they should feel comfortable with the boot-concepts and also the general configuration/customization concept.

On the other hand, Processor templates reduce code duplication when implementing the same design on different boards. Without that intermediate abstraction, all BoardTop sources for the same design would need to repeat the configuration. They are meant for people to test new boards by copying other BoardTop sources and without going into the details of NEORV32 until they want to.

Right... I did not think about that πŸ˜… But maybe this would be something to put in setups/BOARD_XYZ/....

  1. Use the Minimal or MinimalBoot to test if the board is working.
  2. Use the Test or Demo for your board, if it exists. Help create it otherwise.
  3. Create your own ProcessorTop template, where you instantiate the NEORV32 top and apply your custom constraints. Then wrap that in one or multiple BoardTop entities for your target boards.
  4. Create your custom system hierarchy with one or multiple direct instantiations of NEORV32.

I totally agree! 1.) would be covered by the two new test setups I mentioned above. 2.) would be the files from setups/BOARD_XYZ/.... 3.) and 4.) are for advanced users, so they would directly use the processor top entity.

It's not exactly about "further modules" but "further logic". ProcessorTop templates don't have any hardware requirements; all the signals defined in ProcessorTop templates [...]

I know what you mean... For me, a wrapper is "same insides, different plug". Anyway, maybe it would be better to have rtl/adapters and rtl/test_setups folders. rtl/adapters provides the axi4 bridge and the std_logic top and such. These are impossible to re-create using an instance of the processor top entity, so this is consistent with your comment. rtl/test_setups would contain the two basic test setups that are used in the user guide to get newcomers started. Everything else, that is more board-specifc (for example, different configuration but all have the same signals to connect a board's peripherals) should be somewhere in setups.

stnolting commented 3 years ago

I have created an experimental setups-cleanup branch.

βœ”οΈ I have modified the processor's top entity so that all generics are disabled by default. This works with Vivado, Quartus, Radiant and the osflow.

❓ I want to move the test setup to setups/test_setups, add another test setup that uses the pre-initialized IMEM, give both setups a distinctive name, add a nice README and finally I would like to use one of these setups for the current Quartus/Vivado/Radiant example setups. I think we should also use them at least for the FOMU setups.

This is what the entities might look like (they are intended for an absolute minimum):

Test Setup + Bootloader

entity neorv32_testsetup_bootloader is
  generic (
    -- adapt these for your setup --
    CLOCK_SPEED       : natural := 100000000; -- clock frequency of clk_i in Hz
    MEM_INT_IMEM_SIZE : natural := 16*1024;   -- size of processor-internal instruction memory in bytes
    MEM_INT_DMEM_SIZE : natural := 8*1024     -- size of processor-internal data memory in bytes
  );
  port (
    -- Global control --
    clk_i       : in  std_ulogic := '0'; -- global clock, rising edge
    rstn_i      : in  std_ulogic := '0'; -- global reset, low-active, async
    -- GPIO --
    gpio_o      : out std_ulogic_vector(7 downto 0); -- parallel output
    -- UART0 --
    uart0_txd_o : out std_ulogic; -- UART0 send data
    uart0_rxd_i : in  std_ulogic := '0' -- UART0 receive data
  );
end entity;

Test Setup + pre-initialized IMEM

entity neorv32_testsetup_approm is
  generic (
    -- adapt these for your setup --
    CLOCK_SPEED       : natural := 100000000; -- clock frequency of clk_i in Hz
    MEM_INT_IMEM_SIZE : natural := 16*1024;   -- size of processor-internal instruction memory in bytes
    MEM_INT_DMEM_SIZE : natural := 8*1024     -- size of processor-internal data memory in bytes
  );
  port (
    -- Global control --
    clk_i       : in  std_ulogic := '0'; -- global clock, rising edge
    rstn_i      : in  std_ulogic := '0'; -- global reset, low-active, async
    -- GPIO --
    gpio_o      : out std_ulogic_vector(7 downto 0) -- parallel output
  );
end entity;

Maybe the second entity could by directly used as top entity for the FOMU. I am not sure what osflow would do with top entity signals that are not mapped to physical pins (I guess, it might just use any free pins). However, we could connect the lowest 3 bits of gpio_o to the RGB LED (without the driver primitive), tie another 4 pins to the board's gold panels" and maybe tie the last one to an unused package pin. πŸ€”

umarcor commented 3 years ago

It works on Quartus and Radiant. I still have to test Xilinx, but I think that (and the general concept) should be ok: https://forums.xilinx.com/t5/Design-Methodologies-and/Default-values-of-input-and-output-in-VHDL-2008/td-p/855985

@stnolting, I was talking with @Paebbels about this, and he confirmed that the feature you propose is in the LRM since two decades ago! The reason it's not used much in the wild is due to good practices, not a limitation. Let me explain: default values can be provided in entities, in components and in configurations. Therefore, if those three features are mixed and the defaults do not match, it is difficult to know which one is being picked. For this reason, it is typically avoided to define defaults for input ports. In VHDL, explicit behaviour is generally preferred. Nonetheless, in the specific use case of this repository it should be safe to use the feature because we want it at the top level only, and we are not dealing with components or configurations.

We also commented the following:

I believe we should apply those to all the templates/wrappers.

From m point of view, the best solution would be to provide two test setups:

  1. UART + bootloader + LEDs; this is the one we already have; in the user guide this is used to illustrate the bootloader
  2. LEDs + fixed program in IMEM; we "kind of" have this as the default IMEM image is a blinky LED example; we just need a packaged top entity (so a new test setup) for that

Both setups need explicit names and a good explanation in the user guide. If newcomers work through these two setups they should feel comfortable with the boot-concepts and also the general configuration/customization concept.

That is exactly what we are providing now... 1 is MinimalBoot and 2 is Minimal. Naturally, we can rename them, we can change the ports that are shown by default. However, the essence is the same.

Right... I did not think about that πŸ˜… But maybe this would be something to put in setups/BOARD_XYZ/....

1.) would be covered by the two new test setups I mentioned above. 2.) would be the files from setups/BOARD_XYZ/.... 3.) and 4.) are for advanced users, so they would directly use the processor top entity.

Anyway, maybe it would be better to have rtl/adapters and rtl/test_setups folders.

rtl/test_setups would contain the two basic test setups that are used in the user guide to get newcomers started. Everything else, that is more board-specifc (for example, different configuration but all have the same signals to connect a board's peripherals) should be somewhere in setups.

Again, this is essentially what we have at the moment. We have two folders rtl/core/templates/processor and rtl/core/templates/system. You are proposing to have two folders too, and some of the wrappers are equivalent to the existing ones. Overall, I feel that this issue is about "renaming, moving and cleaning (reducing defaults)" of the existing templates, but there is no significant modification of what the wrappers are for or which there are. So:

mkdir setups/examples/common
mv rtl/templates/processor/neorv32_ProcessorTop_stdlogic.vhd setups/examples/common/
mv rtl/templates/processor/neorv32_ProcessorTop_Test.vhd setups/examples/common/
mv rtl/templates/processor/neorv32_ProcessorTop_UP5KDemo.vhd setups/examples/common/
mv rtl/templates/system/neorv32_SystemTop_axi4lite.vhd  setups/examples/common/

mv rtl/templates/processor/* rtl/

axi4 bridge and the std_logic top and such. These are impossible to re-create using an instance of the processor top entity, so this is consistent with your comment

Note that the stdlogic ProcessorTop template does not have any additional logic. It's just casting between std_logic and std_ulogic. With VHDL >= 2008, those are equivalent and therefore the template is unnecessary because it is "exactly" the same as instantiating the NEORV32 top.

umarcor commented 3 years ago

This is what the entities might look like (they are intended for an absolute minimum)

Note that those minimal templates prevent users from enabling CPU extensions if they want to. It's ok not to expose the generics corresponding to the peripherals whose ports are hidden. That is, it is pointless to show the IO_TWIEN generic if the `twi*` ports don't exist in the template. However, I don't see why we might want to prevent users from optionally enabling the features which do not require modification to the ports.

Instead of writing those minimal templates from scratch, I suggest to try the other way: pick ProcessorTop_Minimal or ProcessorTop_MinimalBoot and do a cleanup there: remove the open ports, remove the in ports which correspond to a disabled peripheral which have a default, etc. After you are done with that, evaluate whether the remaining generics really need to be hidden.

As a matter of fact, my first iterations of Minimal and MinimalBoot were exactly as the ones you just proposed. Most of the optional generics were added afterwards. I can argue why the CPUEXTENSION* generics should always be visible (optional, but not "blocked"), but I don't know about the PMP, HPM, etc.

umarcor commented 3 years ago

Maybe the second entity could by directly used as top entity for the FOMU. I am not sure what osflow would do with top entity signals that are not mapped to physical pins (I guess, it might just use any free pins). However, we could connect the lowest 3 bits of gpio_o to the RGB LED (without the driver primitive), tie another 4 pins to the board's gold panels" and maybe tie the last one to an unused package pin. πŸ€”

The problem with that approach is that you are expecting the constraints file to match the top-level ports used in the entity. That is, you are expecting all the setups (regardless of the tool) to support assigning exactly the same identifiers to the same ports, including support for arrays (or not).

The structure we are using is the opposite: we want to use a single constraint file per board (for multiple designs), and we want to use a single design top-level for multiple boards. That's why we have BoardTop files. It's where we make the binding between the identifiers in the constraints file for one specific board and the ports in a design top-level (either a ProcessorTop or a SystemTop). We do typically need the BoardTop for other reasons too (such as instantiating a PLL or normalising buttons/LEDs). Therefore, it is unrealistic to use a ProcessorTop or a SystemTop as the top-level of a design. Even though it might work in very specific examples, our focus should be to stress the importance of having BoardTop files, even if they might feel unnecessary in the most simple cases. That is the knowledge that users will need as soon as they try to combine NEORV32 with anything else.

stnolting commented 3 years ago

I was talking with @Paebbels about this, and he confirmed that the feature you propose is in the LRM since two decades ago!

Oh, good to know! πŸ˜…

Nonetheless, in the specific use case of this repository it should be safe to use the feature because we want it at the top level only, and we are not dealing with components or configurations.

I agree.

The clock must always be explicitly connected; it should not have a default value. Without a proper clock, the system is useless. It is important enough for forcing the users to think about it. Other signals, such as resets, might have a default value, but not the clock.

Very good point! I will change that here.

The same applies to generics. For instance, the clock frequency should not have a default value, so that users need to explicitly set it.

Right now the top uses this:

  assert not (CLOCK_FREQUENCY = 0) report "NEORV32 PROCESSOR CONFIG ERROR! Core clock frequency (CLOCK_FREQUENCY) not specified." severity error;

Since some synthesis tools ignore those kind of errors (looking at you Radiant!) your proposal seems more reasonable. :+1:

It is not desirable to use 0 as a default, because it makes difficult to debug waveforms. Having U and X propagated is one of the benefits of the 9-value std_logic types, in comparion to verilog's or booleans. By using 0, we are not using the capability of the language. Note that, when synthesising and implementing on FPGA, those are (should be) translated to zeros.

Also a very good point! I think U seems to be an intuitive choice here. On a second thought, L might be a better choice: one will see in simulation that this signal is "different" but we would ensure that this will turn out as 0 for real implementation. What do you think?

That is exactly what we are providing now... 1 is MinimalBoot and 2 is Minimal. Naturally, we can rename them, we can change the ports that are shown by default. However, the essence is the same.

Yeah, you are right πŸ˜„ Maybe its obsessive, but I just want to draw a hard line between processor-sore rtl and ready-to-put-on-your-board rtl files plus I like to have you-get-what-you-see file names here (so, for the the test setups).

Again, this is essentially what we have at the moment. We have two folders rtl/core/templates/processor and rtl/core/templates/system. You are proposing to have two folders too, and some of the wrappers are equivalent to the existing ones. Overall, I feel that this issue is about "renaming, moving and cleaning (reducing defaults)" of the existing templates, but there is no significant modification of what the wrappers are for or which there are. So:

Basically, this is true.

mv rtl/templates/processor/neorv32_ProcessorTop_UP5KDemo.vhd setups/... βœ”οΈ mv rtl/templates/processor/neorv32_ProcessorTop_Test.vhd setups/test_setups? βœ”οΈ mv rtl/templates/system/neorv32_SystemTop_axi4lite.vhd setups/examples/common/ :x: I would keep that in rtl/SOMETHING because this is not related to any FPGA-board specific setup.

Note that the stdlogic ProcessorTop template does not have any additional logic. It's just casting between std_logic and std_ulogic. With VHDL >= 2008, those are equivalent and therefore the template is unnecessary because it is "exactly" the same as instantiating the NEORV32 top.

So maybe we could just delete rtl/templates/processor/neorv32_ProcessorTop_stdlogic.vhd? :wink:


Note that those minimal templates prevent users from enabling CPU extensions if they want to. It's ok not to expose the generics corresponding to the peripherals whose ports are hidden. That is, it is pointless to show the IO_TWIEN generic if the twi* ports don't exist in the template. However, I don't see why we might want to prevent users from optionally enabling the features which do not require modification to the ports.

Right, but this is intended here. :smile These test setups are meant for beginners. They should not have to worry about something like RISC-V ISA extensions and the potential side effects. The default CPU config should just work with the default makefiles, linker script and such. If a user wants to experiment with additional ISA subsets using the test-setups there is always the option to add that specific line to the code. Of course this is not as comfortable as turning false to true but I think this is something more advanced user should be able to do. Nevertheless, there should be some text in the test-setups' README for that.


The problem with that approach is that you are expecting the constraints file to match the top-level ports used in the entity. That is, you are expecting all the setups (regardless of the tool) to support assigning exactly the same identifiers to the same ports, including support for arrays (or not).

Right... Sometimes I tend to over-simplify things... Board top files are the way to go. however, I think that these board tops should follow the concept of the test setups: instantiate the actual processor top and only code the generics / signals the setup actually requires. But maybe wrapping the test setups is more straight forward for beginners... I am not sure about this. πŸ€”

Paebbels commented 3 years ago

Also a very good point! I think U seems to be an intuitive choice here. On a second thought, L might be a better choice: one will see in simulation that this signal is "different" but we would ensure that this will turn out as 0 for real implementation. What do you think?

U isn't the intuitive choice, it's the default value so all std_ulogic based datatypes (std_logic, std_(u)logic_vector, signed, unsigned, ...). Objects get initialized with the left-most value in the value range. That's why std_ulogic is define as type std_ulogic is (U, ...);. This allows you to not specify default values in ports or default values in signal declaration and all signals are U by default.

Moreover, IEEE Std. 1164 (which defined the std_logic type family and the 9-valued logic) has special rules for U and X propagation. This means:

L does not provide such propagation. Moreover it is clearly defined that all values not being 1, L, H or Z should translate to 0 in synthesis. If pullup/down is not supported, week values should translate to the strong equivalents.

So please don't invent your own logic to mark "unused/uninitialized". It's all well defined :).


by default Vivado isn't evaluating assert statements.

umarcor commented 3 years ago

Also a very good point! I think U seems to be an intuitive choice here. On a second thought, L might be a better choice: one will see in simulation that this signal is "different" but we would ensure that this will turn out as 0 for real implementation. What do you think?

I think that U is the idiomatic solution. All relevant signals should have proper reset management in the design and should not depend on the "bitstream initialisation" values. Therefore, if a U is seen in simulation where it should not, it's a bug to be fixed, not something we want to "fake" through L. The most important is to see the bug fast and understand it. After doing so, it is legit to consider it not critical and to work around it. However, if we don't let bugs show up, we cannot fix them.

This is a relevant difference between VHDL and Verilog. In VHDL, the language and the designers expect things to fail as fast as possible and to be as explicit as possible about it. That is inherited from Ada and it's the main motivation for both of them to be preferred in critical applications, in comparison to Verilog/C. Naturally, VHDL/Ada are sometimes annyoing, particularly for the less critical use cases, where developers might like slightly more fluid iteration during development. However, in the mid-long term it pays off. It takes longer to get an initial feature to work, but then it's much less likely to be problematic.

Yeah, you are right πŸ˜„ Maybe its obsessive, but I just want to draw a hard line between processor-sore rtl and ready-to-put-on-your-board rtl files plus I like to have you-get-what-you-see file names here (so, for the the test setups).

I understand, but I feel you are trying to compensate a lack of documentation with overthinking. Don't take me wrong, the blame is partially on my side :wink:.

The distinction between ProcessorTop and BoardTop is explained easy: each of the ProcessorTop can be used for different boards, but each BoardTop is specific for one board and one design.

The "you-get-what-you-see" file names are impossible to get in practice, because different users with different backgrounds will see completely different things, no matter how hard you try with the naming. For instance, there is no reason for someone not to pick the UP5K ProcessorTop template and use it on an Arty-Z7. Maybe they want to use exactly that set of peripherals, so, why not?

As I commented in some other issue, you can (should) use names which don't have a meaning (planets, mountains, cities, colours...). There are two solutions to ambiguity: try to be very clear or not be clear at all. In this case, we want names not to be descriptive so that users are forced to reading, at least, the README. The README might be:

We provide several templates with various port requirements and default memory sizes. Those are expected to be used for porting NEORV32 to new boards and as references for users to write their own instantiations of NEORV32's top. Find further details in the table below

Template FeatureA FeatureB FeatureC FeatureD ...
Mars true false - - true
Saturn true true true true -
Neptune - true - - -

Where - means "hidden" (not usable in this template), true means usable and enabled, false means usable but disabled (i.e. optional).


mv rtl/templates/processor/neorv32_ProcessorTop_UP5KDemo.vhd setups/... βœ”οΈ mv rtl/templates/processor/neorv32_ProcessorTop_Test.vhd setups/test_setups? βœ”οΈ mv rtl/templates/system/neorv32_SystemTop_axi4lite.vhd setups/examples/common/ ❌ I would keep that in rtl/SOMETHING because this is not related to any FPGA-board specific setup.

Neither ProcessorTop_UP5KDemo nor ProcessorTop_Test are related to any FPGA-board specific setup. The only relation is the name. If we s/UP5KDemo/Uranus/ and s/Test/Earth/, everything would be ok. All the examples would still work, everything would as consistent as it is now, and there would be no ambiguity about what do we expect the ProcessorTop_UP5KDemo to be used for. It's just a template with features X, Y and Z enabled/disabled/hidden (as users would see in the table), which "incidentally" is a setup that requires almost 100% of an UP5K device in case an additional PLL is used in the BoardTop. However, this last sentence is nothing other than a fact to be written in the README, because the important concept is that ProcessorTop files are not specific for a board.

So maybe we could just delete rtl/templates/processor/neorv32_ProcessorTop_stdlogic.vhd? πŸ˜‰

Maybe... Yet, I don't feel confident to tell "we need to remove it now". I don't see who might want to use it, but that does not mean it's not useful for someone. Why was it created in first place? Which was the use case that required it?

Right, but this is intended here. :smile These test setups are meant for beginners. They should not have to worry about something like RISC-V ISA extensions and the potential side effects. The default CPU config should just work with the default makefiles, linker script and such. If a user wants to experiment with additional ISA subsets using the test-setups there is always the option to add that specific line to the code. Of course this is not as comfortable as turning false to true but I think this is something more advanced user should be able to do. Nevertheless, there should be some text in the test-setups' README for that.

I feel we are missing one step between being an absolute beginner and experienced enough for adding a generic to two files, copying them from a third one.

  1. I would expect users to first try using an existing BoardTop for a board they own. When doing so, they are currently exposed to a complexity equivalent to what you propose:
  1. In each of those cases, they can go to the ProcessorTop in order to find which features of the core they can enable or modify without modifying the ports. This is particularly useful for software people and those less experienced with VHDL. All they need to do is copy the generic line from the ProcessorTop to the BoardTop.

  2. They can go further to the NEORV32 top, in order to find which features are hidden in the template (neither generics nor ports are shown).. If they want to use one of those, they need to copy them from the top to the ProcessorTop and then use them in the BoardTop instantiation.

In your proposal, 2 is lost. The only options are "you care about nothing" or "you care about everything". I don't see why we need to remove that step, if should just work use case (1) is well covered in any case. Note that I'm assuming we will do the cleanup modifications to the ProcessorTop templates.

Right... Sometimes I tend to over-simplify things... Board top files are the way to go. however, I think that these board tops should follow the concept of the test setups: instantiate the actual processor top and only code the generics / signals the setup actually requires. But maybe wrapping the test setups is more straight forward for beginners... I am not sure about this. πŸ€”

The following are two designs, to be implemented on three boards. Let's say that Uranus is a NEORV32 with Stream enabled, and the Milky Way is an hypothetical SystemTop_Streams which includes an interconnect.

hdlhierarchy

We need one single ProcessorTop template and one single SystemTop. It is obvious/explicit that we are dealing with one design including the CPU only and with another design adding some extra logic around it. All three boards reuse those without modifications by just optionally specifying some generics in the port map.

Now, let's remove the ProcessorTop layer:

hdlnoproctop

Now users cannot tell that the three designs on the left are in fact using the same subset of NEORV32 features, and they cannot tell that the three designs on the right are wrapping that one with additional external logic (not internal features). They need to read and understand the code in order to find that out. At the same time, maintainers need to be careful because boards might easily go out of sync.

stnolting commented 3 years ago

@Paebbels

Thanks for clearing!

So please don't invent your own logic to mark "unused/uninitialized". It's all well defined :).

I will try πŸ˜… :wink:


@umarcor

I think that U is the idiomatic solution. All relevant signals should have proper reset management in the design and should not depend on the "bitstream initialisation" values. Therefore, if a U is seen in simulation where it should not, it's a bug to be fixed, not something we want to "fake" through L. The most important is to see the bug fast and understand it. After doing so, it is legit to consider it not critical and to work around it. However, if we don't let bugs show up, we cannot fix them.

I am concerned only about the input signals. I want to avoid that a setup by a lazy designer (like me!) does not work because not all of the available input ports are used in the instantiation. I agree that using 0 as default prevents the user from seeing what might cause problems. U (according to @Paebbels: this is the default if not giving an explicit default value) seems reasonable, but not for all signals I think.

For example interrupt signals: not all of them might be used for a specific setup. I think it is ok to allow the designer to simply ignore the unused one as they have a default value, that will cause no harm. L seems good here because that won't activate that IRQ - neither in synthesis nor in simulation but you are still able to see that this signal "is not actually assigned by the user" in simulation.

Therefore, U should be used only for "data" signals, while all "control" signals (that might cause crashes) should be initialized with L. πŸ€”


@umarcor

Thank you for commenting in such a detailed (and patient :smile:) way. :+1:

I know that a hard board-top vs processor-wrapper separation might be the most straightforward concept: the board-top is the PCB and the processor-wrapper is the black box chip you solder onto that PCB.

However, these wrappers feel redundant as they are just wrappers. Hiding complexity is good here, but we could also do that by always instantiating neorv32_top and just use the generics and signals we actually need for out setup. As a nice add-on the user will directly see "what's included" (in terms of ISA extensions etc.).

Even though I like your idea (the table), I think that this is hard to maintain.

One might ask "why is feature X included in a specific wrapper while feature Y is not"? We cannot provide wrappers for every possible configuration. And we should not provide new wrappers for every new board or every new board revision. I am afraid that there might be many several top wrappers someday: one provides PWM, one provides simple GPIO LED control, another one uses some streams... I think it is better to keep that in specific board setups: setups/FOMU/pwm_demo etc.

In summary I think we should not use processor-top wrappers if they are just wrappers. However, we should provide additional tops when there is actually something going on inside (you already proposed that) - so for example we have logic for converting to AXI4 or there is some kind of interconnect inside. I am ok to put them into rtl/ (maybe in something like rtl/system_integration/ or rtl/system_connectivity/) or even into setups if they are just for some specific board.


Maybe... Yet, I don't feel confident to tell "we need to remove it now". I don't see who might want to use it, but that does not mean it's not useful for someone. Why was it created in first place? Which was the use case that required it?

I think that many people use std_logic by default. In setups with pre-VHDL2008 support this wrapper is just a convenient feature. I agree that we should keep that (at least for now :wink:).

stnolting commented 3 years ago

By the way, what do you think about (finally) creating a new repository neorv32-setups? We could put all the setups and example implementations there and include the processor as submodule. Maybe this would be a cleaner approach. However, it would keep board-less example setups here. πŸ€”

umarcor commented 3 years ago

I want to avoid that a setup by a lazy designer (like me!) does not work because not all of the available input ports are used in the instantiation.

Therefore, U should be used only for "data" signals, while all "control" signals (that might cause crashes) should be initialized with L. πŸ€”

The main point is that lazyness leads to mistakes/bugs that show up later in the most unexpected and unpleasant way. That's the main value of VHDL (and Ada) compared to other more permissive languages/ecosystems. In VHDL it might be more painful to setup a design, but once it works it will unlikely break. Bug hunting/fuzzing projects such as verismith do not apply to VHDL precisely because of the design principles in the language.

Therefore, if some situation might cause crashes, the sensible solution is not to avoid it but to handle it properly. In other words, if the system depends heavily on the initialization values of some signals not being U, the system is fundamentally broken as a generic hardware design; it is the firmware/gateware of an FPGA device only. Don't take me wrong, as this is perfectly reasonable given that the main use case of NEORV32 is being a soft-core, for now. However, it might be good to note down some of the assumptions, such as the target FPGA device supporting initialization values for registers.

I know that a hard board-top vs processor-wrapper separation might be the most straightforward concept: the board-top is the PCB and the processor-wrapper is the black box chip you solder onto that PCB.

In fact, there is a PCBTop, which wraps BoardTop. So:

However, PCBTop and BenchTop are not for synthesis (VHDL is a hardware modelling language, not a synthesis language), so I did not introduce those in this repo, in order to avoid excessive complexity.

Hiding complexity is good here, but we could also do that by always instantiating neorv32_top and just use the generics and signals we actually need for out setup. As a nice add-on the user will directly see "what's included" (in terms of ISA extensions etc.).

As commented, by removing that abstraction, the information about "this same design/configuration works on all these boards" is lost. Currently, we know that "MinimalBoot works on Fomu, OrangeCrab, UPduino and iCESugar", "UP5KDemo works on Fomu and UPduino", "Minimal works on Fomu and iCESugar", etc. By changing some of the hidden settings in the MinimalBoot template, we can automatically have it tested on all the boards. Conversely, if we instantiate the neorv32_top directly, the change needs to be manually applied on all the example files. Soon, we can no longer say that one setup works on multiple boards because each of them will be specific.

Even though I like your idea (the table), I think that this is hard to maintain.

I believe this is arguable. If you don't want to have equivalent setups on multiple boards, I agree, removing the abstraction makes it easier to maintain. Conversely, if that is required, the intermediate abstraction precisely reduces the maintenace burden.

One might ask "why is feature X included in a specific wrapper while feature Y is not"?

You did reply to that question/demand yourself:

"We cannot provide wrappers for every possible configuration. And we should not provide new wrappers for every new board or every new board revision".

Therefore, we do provide some wrappers for some boards. Precisely, we provide wrappes for the boards that most active contributors can test, and we select the peripherals in the wrappers according to that subset of boards/designs that we frequently test/maintain. Any user is invited to use that as a reference and create their own wrappers in case they want to target multiple boards; or to use neorv32_top directly if they are targeting one single board.

I am afraid that there might be many several top wrappers someday: one provides PWM, one provides simple GPIO LED control, another one uses some streams... I think it is better to keep that in specific board setups: setups/FOMU/pwm_demo etc.

I see we might have exactly one wrapper where one peripheral is enabled/exposed and others are hidden. I think that is correct, because we can use them for testing that one peripheral. On the one hand, we can instantiate it in a testbench; on the other hand, we can test it on multiple boards.

Apart from that, I see we might have one wrapper for "as much as we can fit in a demo for each tested device". I think that is correct too, because we are not supporting more than half a dozen different devices. The largest case (everything enabled) is what we are currently using in testbenches.

Last, we have the Minimal and MinimalBoot which are meant to test the load/boot mechanisms. In some sense, these can be considered in the first group.

Therefore, I think that moving them around is ok. However, if the templates are moved to the setups subdir and then that subdir is split to a separated repo, we won't be able to use them for running the "internal" simulation tests in this repo. Well, we can do it, but it will feel strange that CI depends on them.

By the way, what do you think about (finally) creating a new repository neorv32-setups? We could put all the setups and example implementations there and include the processor as submodule. Maybe this would be a cleaner approach. However, it would keep board-less example setups here. πŸ€”

As commented before, I feel we are mixing things here.

One thing is having different levels of abstraction which match the hierarchy in the design (from a more application specific top to a more generic bottom). In those abstractions there is overlap (reuse) between simulation and synthesis; that is the case of the ProcessorTop and SystemTop templates (not the BoardTop).

A different thing is moving the examples which add peripherals and maybe companion IP cores (such as the USB-to-UART or LiteDRAM) to another repository.

I believe it makes sense to move the specific examples and peripherals somewhere else. It might also make sense to move the whole setups subdir. However, it does not make sense to move the ProcessorTop or SystemTop templates in any board-specific subdir/repo. Those should remain with the core because they are reusable for simulation and for synthesis, unless removed (with the aforementioned drawbacks). Since the templates layout was added 1-2 months ago, I would ask to please let it settle down and prove their usability before having them removed.

stnolting commented 3 years ago

Therefore, if some situation might cause crashes, the sensible solution is not to avoid it but to handle it properly. In other words, if the system depends heavily on the initialization values of some signals not being U, the system is fundamentally broken as a generic hardware design; it is the firmware/gateware of an FPGA device only. Don't take me wrong, as this is perfectly reasonable given that the main use case of NEORV32 is being a soft-core, for now.

The idea of having non-U defaults for some signals is to avoid frustration when creating a new setup. Let's take the UART as an example:

    uart0_txd_o    : out std_ulogic; -- UART0 send data
    uart0_rxd_i    : in  std_ulogic := 'U'; -- UART0 receive data
    uart0_rts_o    : out std_ulogic; -- hw flow control: UART0.RX ready to receive ("RTR"), low-active, optional
    uart0_cts_i    : in  std_ulogic := 'L'; -- hw flow control: UART0.TX allowed to transmit, low-active, optional

If the user (lazy me) tries to get data from the UART and rxd is not connected, there is no problem. The user will just get zero. But if the user accidentally enables the UART's hardware flow-control and tries to execute printf() this might cause a permanent system stall when cts is not assigned correctly.

If all inputs are U by default, the processor won't halt and catch fire (hopefully!) :smile: I just think that that using L only for some crucial control signals might reduce problems with unintended side effects (for example if I have not connected any Wishbone bus signal at all and I try doing an access via Wishbone).

However, it might be good to note down some of the assumptions, such as the target FPGA device supporting initialization values for registers.

I agree that there should be some documentation. I think I added a sentence (maybe we need more) regarding the default values in the documentation. Btw, the registers (so the real FFs) do not require a defined initialization (CPU and whole SoC). Register that require a defined reset have a defined reset in the VHDL code. The remaining registers are initialized/reset to - (don't care).


"We cannot provide wrappers for every possible configuration. And we should not provide new wrappers for every new board or every new board revision".

Therefore, we do provide some wrappers for some boards. Precisely, we provide wrappes for the boards that most active contributors can test, and we select the peripherals in the wrappers according to that subset of boards/designs that we frequently test/maintain. Any user is invited to use that as a reference and create their own wrappers in case they want to target multiple boards; or to use neorv32_top directly if they are targeting one single board.

You (I?) convinced me :smile: I see that we need an intermediate abstraction layer between the SoC itself an the actual board.

I see we might have exactly one wrapper where one peripheral is enabled/exposed and others are hidden. I think that is correct, because we can use them for testing that one peripheral. On the one hand, we can instantiate it in a testbench; on the other hand, we can test it on multiple boards.

I like the idea having different "templates" (?) that only expose one module's peripherals and I would be ok with having this in the neorv32 repo. :+1:

Apart from that, I see we might have one wrapper for "as much as we can fit in a demo for each tested device". I think that is correct too, because we are not supporting more than half a dozen different devices. The largest case (everything enabled) is what we are currently using in testbenches. [...] I believe it makes sense to move the specific examples and peripherals somewhere else. It might also make sense to move the whole setups subdir. However, it does not make sense to move the ProcessorTop or SystemTop templates in any board-specific subdir/repo. Those should remain with the core because they are reusable for simulation and for synthesis, unless removed (with the aforementioned drawbacks).

In the testbench we directly use the processor top (no intermediate abstraction layer). I think it is ok to always use neorv32_top for the MAXIMUM setup. If we have setups that use "a lot but not all" optional features, that feels more FPGA-specifc and maybe should be in neorv32-setups repo πŸ€”

Last, we have the Minimal and MinimalBoot which are meant to test the load/boot mechanisms. In some sense, these can be considered in the first group.

This is something that I would like to replace. These two minimal setups should only focus on the boot concept (IMEM_RAM + bootloader / IMEM_ROM + pre-init-image) and thus should only use basic IO (UART and GPIO, no fancy PWM). This is what I have tried to implement in rtl/test_setups/ in #131. GPIO is available in both setups - that makes the user guide sections much more straight forward as the same demo software can be used for both setups.

A different thing is moving the examples which add peripherals and maybe companion IP cores (such as the USB-to-UART or LiteDRAM) to another repository.

:+1:

Since the templates layout was added 1-2 months ago, I would ask to please let it settle down and prove their usability before having them removed.

I have moved them to neorv32-setups. This is not "official" yet - I was just experimenting with the infrastructure there.


By the way, sorry for the copy/paste/quote chaos :smile: Maybe we should continue this in #131 :wink: