nimish15shah / DAG_Processor

A DAG processor and compiler for a tree-based spatial datapath.
MIT License
12 stars 3 forks source link

Hello author, I would like to consult about the experiment of using DRAM in DPU-v2 #3

Closed fotuodunwu closed 1 month ago

fotuodunwu commented 1 month ago

Hello author, I read your paper on DAG, which is a very excellent research. I would like to ask about the experiment of using DRAM with DPU-v2. Since the crossbar (64 x 64 x 32) is difficult to implement in FPGA, it seems impossible to verify the SoC system of DPU-v2 based on FPGA. May I ask if there is any way to complete this experiment? Is gem5 okay? Thank you very much.

nimish15shah commented 1 month ago

Hello,

Some suggestions:

1) DPU-v2's code is configurable. The number of PE trees could be reduced to use a 8x8x32 crossbar or smaller, instead of 64x64x32 crossbar.

2) You can perhaps use a DRAM model and simulate DPUv2 using a normal system verilog simulator. Porting the design to gem5 is possible but would need lots of work.

All the best with the experiment! I look forward to the results.

fotuodunwu commented 1 month ago

Hello,

I mainly research hardware acceleration for circuit simulation. Previously, I mainly focused on algorithm optimization. Recently, I want to explore hardware acceleration for SpTRSV. However, in circuit simulation, besides executing SpTRSV, some other software-level tasks might be required. Therefore, I need a CPU to build an SoC system to evaluate the performance gains brought by the accelerator. Although small-scale crossbars can be implemented on an FPGA, the peak throughput of the accelerator is reduced. Additionally, as you mentioned in your paper, bank conflicts will increase. (Maybe using small-scale experiments for theoretical extrapolation to large-scale designs, but I'm not sure if this would be convincing). I've been learning gem5 recently but found it quite complex. Could I know more about the normal system verilog simulator, what specific tools are there?

Thank you!

nimish15shah commented 1 month ago

A system verilog simulator will simulate the hardware at RTL level. It will be slower than gem5 simulation. Simulating a DRAM would be much slower, but I am sure fast DRAM models should be available for RTL simulation.

In the repo, I've provided scripts for the Synopsys VCS simulator. Xilinx also provides verilog simulators with its tools suite. Verilator is another open-source option, but has some limitations. My RTL code might need some tweaks to work with it.

fotuodunwu commented 1 month ago

Thank you very much !