vetter / shoc

The SHOC Benchmark Suite
Other
243 stars 104 forks source link

performance drop observed on SHOC - DeviceMemory Local Memory related tests on Rocm stack #59

Open rpathani opened 7 years ago

rpathani commented 7 years ago

@vetter @Finomnis

As per AMD developer comments who debugged the issue: the test generates kernels based on the device capabilities reported in OCL. In case of Hybrid stack(Orca) OCL runtime reports 32KB of local device memory, but ROCm stack – 64KB. The tests uses a half of the reported amount for local array in a kernel. Thus ROCm ends up with more LDS usage, hence lower wave occupancy and lower performance. The issue should be reported to devrel for test logic replacement.