Memory Architecture - Githubissues

perlindgren commented 7 months ago

Problem:

In context of (hard) real-time systems, predictability is key. In particular worst case behavior is a necessity for most analysis of interest. In general this is a challenge to micro-architecture design.

To that end, RISC-V RT architecture and its Hippomenes implementation takes a radical approach of single cycle instructions. This not only gives the optimal outset for predictability, the simple design avoids complexity of stateful behavior for the hardware implementation of multi-stage pipelines/multi-cycle implementations.

However, the approach still comes with some challenges, notably regarding memory access contention. To this end, RISC-V RT adopts the CSR only approach to peripheral access (including the N-CLIC and vector table implementation.) This provides the ability to retire a write operation to memory while at the same time (through CSR access) retrieve the address of the selected interrupt handler, and thus allows for zero latency interrupt dispatch.

Regarding memory architecture, Hippomenes adopts a pure Harward architecture with a 32 bit, word addressable instruction memory (no compressed instructions), along with a separate byte addressable data memory (supporting single cycle byte/halfword/word operations). The current implementation assumes synchronous writes, while the reads are treated asynchronously.

Targeting the Amd/Xilinx x7 family of FGPAs the instruction memory can be implemented by means of BRAM (with loadable content), while distributed (cell based) memory is currently used for the data memory.

The drawback of distributed memory for the x7, is that it does not support content initialization by readmemh, thus making initialization of constants cumbersome as the pure Harward architecture does not allow for reading the instruction memory programatically.

Solution:

Using SPRAM also for data memory would solve the content initialization problem. However, SPRAM reads are synchronous and violates the current assumption of asynchonous reads. This assumption can be lifted by making the register file write through, and synchronizing other register file inputs.

The RISC-V architecture allows for byte/halfword/word accesses. While the current implementation uses distributed memory, single cycle read/modify/write is implemented in a straight forward manner. SPRAM on the other hand would require a multi-cycle implementation (which breaks with the the single cycle design idiom of Hippomenes).

A potential solution to that problem is to adopt 4 parallel byte wide SPRAMs, allowing both individual and parallel access. In addition this approach would also make single cycle unaligned address accesses possible. In summary:

4 channel SPRAM data memory featuring:

byte/halfword/word, single cycle read
byte/halfword/word, single cycle write
single cycle unaligned access

Benefits:

Preserves the predicable and simple architecture
Opens up for flexible alignment without performance penalties.

To the latter, traditionally CPUs exhibit worse (and in cases much worse) performance for handling unaligned addresses. Based on this assumption compilers choose to adopt padding to structures to ensure run-time efficient access at the cost of memory efficiency. See e.g., Rust Padding.

Lifting the assumption that unaligned accesses comes with a performance penalty might thus allow for more effective memory usage. In the case of Rust, the repr(packed) forces Rust to strip any padding, and only align the type to a byte. As the here proposed 4 channel SPRAM has not yet been implemented the impact to size savings is yet to be evaluated. (As the Rust compiler is assuming padding, algorithms and data structures might have been optimized under this assumption, thus, we cannot be sure that the full potential will be harvested.)

perlindgren commented 7 months ago

POC implementation in branch: https://github.com/perlindgren/hippomenes/tree/spram_mem

Single cycle memory access of SPRAM, by forwarding of register file writes.
As discussed in above solution, the SPRAM organization allows for unaligned access without any overhead, thus single cycle arbitrary sized read and writes are supported.

Todo:

Horrible mem update time since we are using vivado update memory from bash for each of the 4 byte lanes and text segment. (Could be less painful if run in a single TCL session).

Solution:

We still hope there might be some open source alternative to vivado updatemem based on reverse engineered bitstream.

perlindgren commented 7 months ago

Help wanted, let us know if You have found anyway to avoid vivado for achieving updatemem functionality.

perlindgren / hippomenes

Memory Architecture #16