ucb-bar / chipyard

An Agile RISC-V SoC Design Framework with in-order cores, out-of-order cores, accelerators, and more
https://chipyard.readthedocs.io/en/stable/
BSD 3-Clause "New" or "Revised" License
1.57k stars 620 forks source link

Firrtl errors while adding BlockDevice into the design #1018

Open crystalbreezy1 opened 2 years ago

crystalbreezy1 commented 2 years ago

Currently I'm trying to understand the way to add a DMA device into chipyard design. Based on the suggestion in https://github.com/ucb-bar/chipyard/issues/9, I have started with the example of BlockDevice in testchipip. However, when I tried to include a block device in my design, I have received the following errors: [error] (run-main-0) firrtl.passes.PassExceptions: [error] firrtl.passes.CheckInitialization$RefNotInitializedException: @[TestHarness.scala 89:19] : [module TestHarness] Reference chiptop is not fully initialized. [error] : chiptop.blockdev.bits.data.ready <= VOID [error] firrtl.passes.CheckInitialization$RefNotInitializedException: @[TestHarness.scala 89:19] : [module TestHarness] Reference chiptop is not fully initialized. [error] : chiptop.blockdev.bits.req.ready <= VOID [error] firrtl.passes.CheckInitialization$RefNotInitializedException: @[TestHarness.scala 89:19] : [module TestHarness] Reference chiptop is not fully initialized. [error] : chiptop.blockdev.bits.resp.bits.data <= VOID [error] firrtl.passes.CheckInitialization$RefNotInitializedException: @[TestHarness.scala 89:19] : [module TestHarness] Reference chiptop is not fully initialized. [error] : chiptop.blockdev.bits.resp.bits.tag <= VOID [error] firrtl.passes.CheckInitialization$RefNotInitializedException: @[TestHarness.scala 89:19] : [module TestHarness] Reference chiptop is not fully initialized. [error] : chiptop.blockdev.bits.info.max_req_len <= VOID [error] firrtl.passes.CheckInitialization$RefNotInitializedException: @[TestHarness.scala 89:19] : [module TestHarness] Reference chiptop is not fully initialized. [error] : chiptop.blockdev.bits.resp.valid <= VOID [error] firrtl.passes.CheckInitialization$RefNotInitializedException: @[TestHarness.scala 89:19] : [module TestHarness] Reference chiptop is not fully initialized. [error] : chiptop.blockdev.bits.info.nsectors <= VOID [error] firrtl.passes.PassException: 7 errors detected! [error] firrtl.passes.PassExceptions:

I was adding the block device in this way: class InitZeroRocketNoL2Config extends Config( new testchipip.WithBlockDevice(enable=true) ++ new freechips.rocketchip.subsystem.WithNBigCores(1) ++ new freechips.rocketchip.subsystem.WithNExtTopInterrupts(2) ++ new chipyard.config.AbstractConfig)

Besides, I'm new to chisel and chipyard, so I want to know if I can get some more detailed description for this block design? Now I'm still quite confused about its connecting and triggering process.

I was trying to read the MMIO description of the gcd design, and it seems that the design was triggered after y register is written. However, I still not quite sure which register in the block device design provides the same triggering function? Also, I cannot fully understand the description in CanHavePeripheryBlockDevice part, but it seems that the MMIO is connecting to the slave bus (default PBUS), and MEM is connecting to the master bus (default FBUS). I an understand the first assignment, but why the second assignment is happening? Shouldn't we connect to MBUS when trying to access memory?

Lastly, could you please also let me know the target for BlockDeviceModel? It seems that this module is doing read and write between its internal memory and IO interface (also accept requests and send response as well). However, I'm not seeing it being used in the CanHavePeripheryBlockDevice or SimBlockDevice, so how is this module being used in the design?

abejgonzalez commented 2 years ago

I think the error that you are seeing is that you aren't connecting the block device to anything in the test harness. I would look at the following configs to add the proper harness binder. (I would also familiarize yourself with IOBinders and HarnessBinders in the documentation).

https://github.com/ucb-bar/chipyard/blob/9d055fdac638ab90735cbde42fd2d86355eb260b/generators/chipyard/src/main/scala/config/RocketConfigs.scala#L82-L92

As for the FBUS connection, this is the convention that we use to have external devices master memory. Eventually, the transaction will get to memory but you could directly connect to the MBUS for a faster route to memory.

Here is an example C program that uses the block device: https://github.com/ucb-bar/chipyard/blob/master/tests/blkdev.c (headers and the Makefile is also found in that folder). However, the main use of the block device is within the FireSim FPGA-accelerated simulation platform.

Hope that helps.

crystalbreezy1 commented 2 years ago

Hello,

Thank you very much! After reading IOBinder.scala and HarnessBinder.scala and applying the correct test harness config, my design worked correctly. Also, I think this also answers my previous question about the usage of BlockDeviceModel. However, I still have a remaining question.

From BlockDevice.scala, I can understand that the state machine of the tracker module(s) is triggered by the output valid signal of the router module, and the input request valid of router is related to the frontend module outputs. However, I got confused about the valid status of the frontend module: val allocRead = Wire(new RegisterReadIO(UInt(tagBits.W))) io.back.req.valid := allocRead.request.valid .... 0x00 -> Seq(RegField(pAddrBits, addr)), 0x08 -> Seq(RegField(sectorBits, offset)), 0x0C -> Seq(RegField(sectorBits, len)), 0x10 -> Seq(RegField(1, write)), 0x11 -> Seq(RegField.r(tagBits, allocRead)), Given this description, does this mean that output valid of this frontend module is set high when we read the address 0x11, like the "return reg_read8(BLKDEV_REQUEST);" command in blkdev_send_request function (in blkdev.h)? If this is true, where is allocRead generated from (since router is triggered after its input valid set to high)? Also, I saw the following code in BlockDeviceControllerModule class definition in BlockDevice.scala: frontend.io.info := io.bdev.info which seems to give nsectors and max_req_len information, but since the registers storing this information is read-only, I want to know where's the real input source of this information? Does it comes from SimBlockDevice or BlockDeviceModel?

Finally, since you mentioned that the main use of the block device is within the FireSim FPGA-accelerated simulation platform, not a normal DMA device, I want to know if there's any other DMA controller example that I can learn from? Besides, since our project actually requires a DMA controller for DMA data copy between L1/L2 cache and main memory, I want to know if you can give me some advice on the related interface setup? For example, should I set both master and save bus to MBUS? If that is true, what should be the correct nodes that I should set to connect correctly onto the cache and memory interface?

crystalbreezy1 commented 2 years ago

Sorry for keep updating this post, but is there any suggestion on my previous question?

abejgonzalez commented 2 years ago

Sorry I didn't develop the block device so I can't add anything more past the integration steps (the person who made it graduated as well).

The main DMA starter example that we refer people to is in the docs : https://chipyard.readthedocs.io/en/dev/Customization/DMA-Devices.html

The most important part is the fact that you create a TL Client Node where you can then make R/W requests to memory. As for the DMA specifics, I would refer to the examples given in #9.

crystalbreezy1 commented 2 years ago

Thanks for the information! I have read the post #9, and that's why I refer to the BlockDevice example.

In general, if I don't understand it incorrectly, when I create a TL Client Node in a module and connect the module to FBUS/MBUS, it will send my R/W request to the main memory (via the BUS), correct? But if I want to directly access cache, is there any way to do that?

Also, if I change the cache to scratchpad and remove the external memory (like the config ScratchpadOnlyRocketConfig), what will happen if I still connect the DMA client node to FBUS and send R/W requests? Will these requests be sent to the scratchpad?

abejgonzalez commented 2 years ago

Sorry missed that you read #9.

(I should preface this by saying its been a while since I've worked with the TL Node stuff... so my memory might be a bit fuzzy)

I don't know if you've seen this picture in the docs: https://chipyard.readthedocs.io/en/dev/Generators/Rocket-Chip.html. If you attach a client to the FBUS, then your memory request will go through the L2 (through the FBUS, SBUS, L2 (then if miss... MBUS, outer main memory)). However, if you attach it to the MBUS it will directly go to main memory. You would have to modify the cores (i.e. Rocket/BOOM) to do something similar (this most likely would require extensive changes).

I'm frankly not too familiar with this config but if you attach to the FBUS and send the request it should go to the L1 Scratchpad IIRC (also the MBUS should do the same since there is no L2 or outer memory).

crystalbreezy1 commented 2 years ago

Thank you very much for the explanation! I think this is the information that I need.

One final question, if I want to use the DMA to access an external MMIO deivce (with AXI4 interface) to get the data (for storing into the memory), what will be the correct way to do that? Should I create another TL Client Node and attach it to PBUS to send memory access request (to the corresponding MMIO address)? Or there should be another way?

Also, to add this external MMIO device, we will need the external MMIO ports, based on my understanding, we can add the following config: new freechips.rocketchip.subsystem.WithDefaultMMIOPort ++ // add default external master port new freechips.rocketchip.subsystem.WithDefaultSlavePort ++ // add default external slave port

and it will include the TLtoAXI interface as well, like the picture in the docs: https://chipyard.readthedocs.io/en/dev/Generators/Rocket-Chip.html, correct?

crystalbreezy1 commented 2 years ago

Also, could you please give me some more explanation on the definition of "concrete" module? I observed that the gcd example in the doc has implemented the trait with LazyModuleImp, while the InitZero example has not. Since I believe both of them will be instantiated when choosing the corresponding config option, I want to know what's the difference between them, casuing one to be "concrete" and the other one not to be "concrete"?

abejgonzalez commented 2 years ago

What you said seems reasonable. Yes, the *Port config fragments add the TLToAXI adapter to convert the TL that the SoC speaks of to AXI4.

I don't know what you mean by concrete... but I assume you are referring to LazyModule (LM) and LazyModuleImp (LMI). I think our mailing list has many descriptions of what those are... so I would look at that 1st (also Googling brings up a lot of useful stuff on it, and there are tutorial links in the README to give some descriptions on it). In general though... it can be quite confusing... so try to look there 1st.

Very briefly, RocketChip, the underlying system used with Chipyard, has a two-pass system called diplomacy (see the paper to learn more on this) to negotiate parameters between different blocks (i.e. what should the bus width be depending on what is attached). This is implemented with LM and LMI. Every LM is associated with a LMI. All LMs and their connections are evaluated in the 1st pass of diplomacy. This basically sets in stone what the parameters should be (i.e. bus width should be 64b). Once those params are finalized, every LMI is run. The LMI takes the parameters finalized in the 1st phase and using them... creates Verilog (this is normally where the Chisel code goes := Module(...), etc). I assume by "concrete" you are referring to the LMI phase where hardware is instantiated.

Hope that helps a bit.

crystalbreezy1 commented 2 years ago

Thanks for the explanation! Based on this idea, now I'm trying to design a DMA module (designed as an MMIO device) that transfer data between the scratchpad and an MMIO device. I have created three TileLink nodes, one TLRegisterNode using TLRegisterRouter extension and connect it to pbus; one ClientNode to fbus to access scratchpad, and another ClientNode to pbus to access the target MMIO device. Does this design idea make sense?

Besides, for the "concrete" concept that I mentioned above, I'm referring to the following trait defintion explanation in the doc https://chipyard.readthedocs.io/en/latest/Customization/MMIO-Peripherals.html

For peripherals which instantiate a concrete module, or which need to be connected to concrete IOs or wires, a matching concrete trait is necessary. We will make our GCD example output a gcd_busy signal as a top-level port to demonstrate. In the concrete module implementation trait, we instantiate the top level IO (a concrete object) and wire it to the IO of our lazy module.

In this GCD example, both CanHavePeripheryGCD and CanHavePeripheryGCDModuleImp have been defined, while in the InitZero example in the page https://chipyard.readthedocs.io/en/latest/Customization/DMA-Devices.html, only the trait CanHavePeripheryInitZero has been defined. By looking at these two examples, the main difference that I can find is that gcd module will create an external "busy" port while the InitZero module does not create anything like this, Does it match the condition 'which need to be connected to concrete IOs or wires'? If yes, could you please give me an example for the situation 'for peripherals which instantiate a concrete module‘?’