Could you please put Babelstream examples in the repository so that people may understand how CPU/GPU select DDR or HBM accesses described in the paper ? Thanks.
I may do that later. But here is the data I measured on Grace-Hopper (120GB LPDDR5x model). If you'd like to measure the GPU to LPDDR5 bandwidth, use this code: https://github.com/nicejunjie/CUDA-Cpp-STREAM
Could you please put Babelstream examples in the repository so that people may understand how CPU/GPU select DDR or HBM accesses described in the paper ? Thanks.