vortexgpgpu / vortex

https://vortex.cc.gatech.edu/
Apache License 2.0
1.19k stars 247 forks source link

LSU duplicate address detection mechanism #153

Closed DrCilicon closed 1 month ago

DrCilicon commented 1 month ago

Hi, developers: Thanks for this wonderful project. I am working on evaluating the matrix multiplication performance on the Vortex architecture. The hardware simulation is done by VCS.

I noticed that in Release V2.x, the LSU unit has the duplicate address detection mechanism displayed below. However in the newest Release V2.1 this mechanism does not exist in the LSU unit anymore.

image

The simulation results show that the bottleneck is in the local memory access for duplicated address of different threads. Since there are many duplicated address access in matrix multiplication due to data reuse, the duplicate address detection mechanism should be able to reduce the redundant memory access thus increasing the computation throughput.

I wonder why this mechanism is removed and is there any chance this mechanism can be brought back in the upcoming updates.

troibe commented 1 month ago

@tinebp Maybe you could provide some more insight regarding the design decisions here.

tinebp commented 1 month ago

Version 2.0 of the GPU introduced a memory coalescing unit that merges requests targeting the same cache line before they entire the data cache. We remove the old duplication address optimization because the new coalescing hardware should technically overlap that feature. Let us know if this has introduced some performance regression in your case: You can pass the command line option "--perf=2" to blackbox to report the memory system stats. if that is the case, please re-open the bug with a "run.log" trace for us to investigate