lowRISC / lowrisc-chip

The root repo for lowRISC project and FPGA demos.
http://www.lowrisc.org/
Other
596 stars 148 forks source link

Change in bus width for NastiIO #81

Open clare7 opened 6 years ago

clare7 commented 6 years ago

Would it be possible to change the bus width size of NastiIO fron 64 bits to 128bits? That would apply to nasti.r.bits.data and nasti.w.bits.data specifically. The purpose is being to perform read and write to DDR in 128bits instead of 64. Any advice?

jrrk commented 6 years ago

Dear Clare,

The interface to the DDR on LowRISC is 16-bits (if you are talking about the Nexys4-DDR implementation). The DDR interface IP from Xilinx

certainly supports a wider word but I should imagine this is more appropriate to high-end boards that have a PC style DDR SIM module, rather

than an individual mobile DDR. If you want higher throughput you could experiment with degrees of freedom in the clock converter ratio and the

DLLs that control the DDR, always bearing in mind that the FPU is a limiting factor in the clock rate to the Rocket subsystem.

I'm ashamed to say I don't know how to alter the Chisel/Scala part of the design. Maybe someone else can jump in here.

A comprehensive improvement would separate out the FPU pipeline from the L2-cache line-fill/write-back function which would appear to be

the bottleneck in this case. Undoubtedly the design could benefit from a performance improvement, but widening the DDR interface is only part

of the solution.

On 11/12/17 07:54, clare7 wrote:

Would it be possible to change the bus width size of NastiIO fron 64 bits to 128bits? That would apply to nasti.r.bits.data and nasti.w.bits.data specifically. The purpose is being to perform read and write to DDR in 128bits instead of 64. Any advice?

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/lowRISC/lowrisc-chip/issues/81, or mute the thread https://github.com/notifications/unsubscribe-auth/AAgF13aC3H_cSy-m21p8n--w-dTtw1INks5s_N-wgaJpZM4Q8_VX.

wsong83 commented 6 years ago

In theory, yes, you can change the data width. In reality, it is complicated.

Would you please clarify your intention?

clare7 commented 6 years ago

Yes I want to have a 128-bit bus between LL2 and to DDR controller and the DDR controller has the support for 128-bit bus.

Right now I am using v0.4

wsong83 commented 6 years ago

OK, so here is what you might need to do if you really want a 128-bit LLC/DDR interconnect.

      case TLKey("L2toMem") =>
        TileLinkParameters(
          coherencePolicy = new MICoherence(new NullRepresentation(site(NBanks))),
          nManagers = 1,
          nCachingClients = site(NBanks),
          nCachelessClients = 0,
          maxClientXacts = site(NAcquireTransactors) + 2,
          maxClientsPerPort = site(NAcquireTransactors) + 2,
          maxManagerXacts = 1,
          dataBits = site(CacheBlockBytes)*8,
          dataBeats = 8
        )

Good luck.

clare7 commented 6 years ago

I have made necessary changes and the compilation seems alright. But I faced the followed error during testing:

/lowrisc-chip/src/test/cxx/common/dpi_ram_behav.cpp:180: bool Memory32::write(uint32_t, const uint32_t&, const uint32_t&): Assertion(addr & 0x3) == 0' failed. ` what are the changes I need to make on the test file in order to accommodate the data width changes?

wsong83 commented 6 years ago

Did this error jump out at the very beginning of the simulation? It might be a reset error (the behavioural bram model received a random request before the circuit is actually reset).

Can you specify which simulation were you running with this error? Preferably with the full command line history and log file (more information will help me to specify the error).

clare7 commented 6 years ago

Yes the error jumpout at the beginning of the simulation. This happen only after I changed the bandwidth

This is what I run on the command line: TestMem-sim-debug +vcd +vcd_name=test.vcd +load=./mem_ddr.riscv

and then I get the printout from the printf:

memory32 write [4000a720] = 00000000
memory32 write [4000a724] = 00000000
memory32 write [4000a728] = 00000000
memory32 write [4000a72c] = 00000000
memory32 write [4000a730] = 00000000
memory32 write [4000a734] = 00000000
memory32 write [4000a738] = 00000000
memory32 write [4000a73c] = 00000000
memory32 write [4000a740] = 00000000
Glip TCP DPI listening on port 23000 and 23001
Client connected
Write Mem: [Enc]_: 00000093305290736d82b28300007297, 
TestMem-sim-debug : /home/mmwong/Documents/lowrisc-chip/src/test/cxx/common/dpi_ram_behav.cpp:180: bool Memory32::write(uint32_t, const uint32_t&, const uint32_t&): Assertion `(addr & 0x3) == 0' failed.
Aborted (core dumped)

I put in some printf in the dpi_ram_behv.cpp of which I get to see the series of memory32write as shown above.

The Write Mem: [Enc]_: 00000093305290736d82b28300007297 was the printfof the nasti.io.r.bits.datafrom the /uncore/src/main/converter.scala file.

wsong83 commented 6 years ago

So this does not look like a reset issue any more. Clearly the error occurs after the reset is withdrawn.

If this error occurs only when the bus width is doubled to 128-bit, I am afraid there is something wrong with the design that must be found and fixed. I do not think I can tell you what is wrong as I have not done this before. You probably need to try debug yourself.

Something for your consideration:

wsong83 commented 6 years ago

Oh, to simulate your design, probably you need to make the behavioural ram model working with the 128-bit setting anyway.

clare7 commented 6 years ago

After adding moreprintfI get the following, I observed that the address involved was addr[4000000f] and that is why the offset is nonzero hence violated the assertion checking. And from the dev.h file, I get the address for memory as: #define DEV_MAP__mem__BASE 0x40000000llu

Now I suspect the error is resulted from the elfLoader in from dpi_ram_beha.cppwhere the full function is written inloadelf.cpp

// load initial memory
void MemoryController::load_mem(const string& filename) {
  using namespace std::placeholders;
  std::function<void(uint32_t, uint32_t, const uint8_t*)> f =
    std::bind(&Memory32::write_block, &mem, _1, _2, _3);
  elfLoader loader = elfLoader(f);
  loader(filename);
}

and how do i make the behavioural ram model working with 128-bit setting?? any guideline to begin with?

jrrk commented 6 years ago

It just so happens that the block ram in the FPGA build is 128 bits.

So it should be very easy to interface this to your new design. It will no longer be

behavioural, but will probably not be the performance limiting factor.

You can launch this simulation in the board specific directory (fpga/board/nexys4ddr).

Just remove the width conversion and alter the block ram generator in Vivado for your new width.

You can also disable peripherals such as DDR/Minions/VGA/PS2 which are not relevant to your purpose.

You can use iSim (part of Vivado) or VCSMX if you have it from your institute or company.

Needless to say the DDR controller, if included in your simulation, needs to have its width settings modified, with the eventual aim of measuring the benefit. This is for the reason that the PHY runs at 200MHz, but there is a 4:1 speed conversion between that and the controller, and a 2:1 conversion between the controller and the L2-cache. Each of those conversions will be a bottleneck for performance on the slow side of the converter.

If you really want to boost performance, you should figure out how to make the L2-cache run at 50MHz, so that it matches the speed of the DDR controller. This would give an equivalent or better gain, but without the implications for real FPGA occupancy that a 128-bit bus implies. This perhaps should wait until experience has been gained of more straightforward tasks.

On 12/13/2017 03:38 PM, clare7 wrote:

After adding more|printf|I get the following, I observed that the address involved was |addr[4000000f]| and that is why the offset is nonzero hence violated the assertion checking. And from the |dev.h file|, I get the address for memory as: |#define DEV_MAPmemBASE 0x40000000llu|

Now I suspect the error is resulted from the |elfLoader| in from |dpi_ram_beha.cpp|where the full function is written in|loadelf.cpp|

|// load initial memory void MemoryController::load_mem(const string& filename) { using namespace std::placeholders; std::function<void(uint32_t, uint32_t, const uint8_t*)> f = std::bind(&Memory32::write_block, &mem, _1, _2, _3); elfLoader loader = elfLoader(f); loader(filename); } |

and how do i make the behavioural ram model working with 128-bit setting?? any guideline to begin with?

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/lowRISC/lowrisc-chip/issues/81#issuecomment-351428355, or mute the thread https://github.com/notifications/unsubscribe-auth/AAgF1zzPh_YWirys6lR-1KJxkB4OCJkaks5s_-99gaJpZM4Q8_VX.

wsong83 commented 6 years ago

The address 0x4000000f actually means the address is indeed unaligned, which means the assertion is indeed catching an error. So the question then is why the behavioural ram receives such a memory access?

You probably need to trace it back to see who exactly initiate such unaligned request. Generating a VCD waveform may help you finding it.

clare7 commented 6 years ago

Nothing helpful from the VCD so far.At the moment I could only be sure error occurred right at the point when the first line of the executable file been written to the memory. Seems like the offset somehow is set to F as such the first read is jumped to 0x4000000f instead to be aligned accordingly.

By the way, could it be anything to do with the following line from Config.scalafile? val memAlign = BigInt(1L << 30)

what does that line do to the memory?