Add cache - Githubissues

powerjg commented 5 years ago

There are many steps to this that should be their own issues, but I'm going to put everything here for now.

[x] Add a non-combinational memory that takes a configurable number of cycles (#42)
[x] Update combinational memory to use the same interface as the non-combinational
[x] Document new interface as shown in https://github.com/jlpteaching/dinocpu/pull/78#pullrequestreview-258465286
[ ] Factor out function for dmem masking
[ ] Update pipeline to be able to use the async memory
[ ] Write a simple direct-mapped write-through cache
[ ] Make the cache set associative (configurable)
[ ] Make the cache write back (configurable)
[ ] Make the block size configurable (related to #65)

This would be a great future assignment as well.

jardhu commented 5 years ago

I've started implementing the asynchronous memory module but I came across something I think should be addressed early: While it's obvious that data memory needs to be delayed by a configurable latency, should instruction memory be delayed as well?

Doing so would be more accurate when modelling a modern architecture, as this would necessitate using something like an L1i cache to fetch chunks of instructions. But, as far as I can tell it wouldn't really do much from a learning perspective, as in that case the whole point of the cache would be to show how storing data in SRAM to avoid accessing DRAM can greatly optimize the performance of a CPU.

powerjg commented 5 years ago

We should definitely implement the async memory for both instruction and data access.

It's also important to show that data (or instructions) that are reused and kept close in the cache can improve performance. So, both reads (and fetches) as well as stores will see a performance improvement with the cache.

powerjg commented 5 years ago

Oops. Pushed the wrong button. Ignore the "closed" :)

powerjg commented 5 years ago

I updated the top. I think the next step is to make the pipeline which uses the combinational memory use the same memory interface as this. Then, we can update the pipeline to use the non-combinational memory (which @nganjehloo might be working on soon, too).

jardhu commented 5 years ago

There's a few issues we have to handle about the system for the combinational/synchronous memory. We can simply duplicate memory-async.scala and remove the Pipes entirely to create a combinational memory, but in this case we won't be able to use the existing DMemPort for writing for two reasons:

The first issue results from these two lines of code. The memory will instantly respond with the data located at an address, so io.response.valid will be driven high. But since the outstandingReq register requires one cycle to update its contents, outstandingReq.valid will be low, so the assert will fail. I think the simplest and most clear fix is to create a new DMemCombinPort implementation, so one is combinational and the other is our existing asynchronous implementation.

The second issue is that we can't reuse the "read, then write back" design we had for the DMemPort. A combinational memory must be able to issue a write and complete it in one cycle, but to read and write back we would need to issue a read, receive the response, then attempt to write back on the same cycle. This poses a problem since there is only one channel through which memory ports can send Requests, so the port would have to drive the request to both read and write at the same time.

My proposal for a rough, hacky solution that avoids making the backing memory do the masking/s-ext logic would be to modify the combinational backing memory to be able to support requests that both read and write at the same time. With this solution, the rundown of a write operation goes as follows:

The pipeline issues a write to the DMemCombinPort. It sets the outgoing Request to a ReadWrite, and supplies the appropriate address.
The backing memory receives the ReadWrite request. It retrieves the block of data at the write address, and responds with that address.
Back in the DMemCombinPort, it performs the masking and sign extension on the memory's response. It then sets the outgoing Request's writedata to the new block of data, which is where the reading portion crosses into the write portion.
In the backing memory, it writes the block of data normally. This ends the combinational part of the write operation, and since the reading logic is for the most part separated from the writing logic, it should all work as intended.

I think we would be able to use the Write enum since on every combinational write the memory has to be read/written to simultaneously, but I believe it's clearer to define the operation on its own - Reads solely read from memory, Writes solely write to memory, and ReadWrites can perform both operations in one cycle, but is reserved exclusively for combinational memory.

powerjg commented 5 years ago

Yes, I think creating a new DMemPort with the same I/O makes the most sense.

We can just change which one we instantiate based on the configuration.

On Wed, Jun 12, 2019 at 11:11 AM Jared Barocsi notifications@github.com wrote:

There's one major refactoring we have to do with the system for the combinational/synchronous memory. We can simply duplicate memory-async.scala and remove the Pipes entirely to create a combinational memory, but in this case we won't be able to use the existing DMemPort for writing for two reasons:

The first issue results from these two lines of code. https://github.com/jlpteaching/dinocpu/blob/2958b5e58c17d947ae71b67f1092d5dc75fee938/src/main/scala/components/memory-ports.scala#L106-L107 The memory will instantly respond with the data located at an address, so io.response.valid will be driven high. But since the outstandingReq register requires one cycle to update its contents, outstandingReq.valid will be low, so the assert will fail. I think the simplest and most clear fix is to create a new DMemPort implementation where one is combinational and the other is our existing asynchronous implementation.

The second issue is that we can't reuse the "read, then write back" design we had for the DMemPort. A combinational memory must be able to issue a write and complete it in one cycle, but to read and write back we would need to issue a read, receive the response, then attempt to write back on the same cycle. This poses a problem since there is only one channel through which memory ports can send Requests, so the port would have to drive the request to both read and write at the same time.

Again, this would require creating a new implementation for a combinational data memory port. A rough, hacky solution that avoids making the backing memory do the masking/s-ext logic would be to modify the combinational backing memory to be able to support requests that both read and write at the same time. With this solution, the rundown of a write operation goes as follows:

The pipeline issues a write to the DMemCombinPort. It sets the outgoing Request to a ReadWrite, and supplies the appropriate address.

The backing memory receives the ReadWrite request. It retrieves the block of data at the write address, and responds with that address.

Back in the DMemCombinPort, it performs the masking and sign extension on the memory's response. It then sets the outgoing Request's writedata to the new block of data, "bridging" the read and write together.

In the backing memory, it writes the block of data normally. This ends the combinational part of the write operation, and since the reading logic is for the most part separated from the writing logic, it should all work as intended.

— You are receiving this because you modified the open/close state. Reply to this email directly, view it on GitHub https://github.com/jlpteaching/dinocpu/issues/28?email_source=notifications&email_token=AAA2YHECIVJMHOA46OZCCGLP2E345A5CNFSM4HFWYFX2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODXRKICQ#issuecomment-501392394, or mute the thread https://github.com/notifications/unsubscribe-auth/AAA2YHGEIMERNUUYQXQNMYDP2E345ANCNFSM4HFWYFXQ .

-- Jason Lowe-Power Assistant Professor, Computer Science Department University of California, Davis 3049 Kemper Hall https://arch.cs.ucdavis.edu/

jlpteaching / dinocpu

Add cache #28