Closed ysh329 closed 6 years ago
My bandwidth results are as below (more concrete logs're here):
half1: 5.16 GB/s
half2: 4.71 GB/s
half4: 5.14 GB/s
half8: 5.50 GB/s
half16: 4.98 GB/s
half1-A53: 2.10 GB/s
half1-A72: 3.91 GB/s
short1: 5.29 GB/s
short2: 4.71 GB/s
short4: 5.07 GB/s
short8: 5.52 GB/s
short16: 5.00GB/s
short1-A53: 2.26 GB/s
short1-A72: 4.51 GB/s
int1: 5.26 GB/s
int2: 5.49 GB/s
int4: 6.13 GB/s
int8: 5.49 GB/s
int16: 5.28 GB/s
int-a53: 2.25 GB/s
int-a72: 4.53 GB/s
float1: 4.83 GB/s
float2: 4.72 GB/s
float4: 5.39 GB/s
float8: 4.72 GB/s
float16: 4.52 GB/s
float-a53: 2.15 GB/s
float-a72: 4.04 GB/s
double1: 4.49 GB/s
double2: 6.39 GB/s
double4: 5.58 GB/s
double8: 5.40 GB/s
double16: 5.51 GB/s
double1-A53: 2.29 GB/s
double1-A72: 4.58 GB/s
The gap between clpeak (bandwidth is bigger than measures using my code) and my bandwidth is due to read operation only for clpeak, but my bandwidth have both read and write operations in kernel function.
Kerel function is here.
Global memory bandwidth (GBPS)
float : 3.84
float2 : 6.00
float4 : 7.33
float8 : 6.01
float16 : 5.78
Kernel function is here.
float1: 4.83 GB/s
float2: 4.72 GB/s
float4: 5.39 GB/s
float8: 4.72 GB/s
float16: 4.52 GB/s
float-a53: 2.15 GB/s
float-a72: 4.04 GB/s
Before, set max freq. for gpu and cpu using scrips in tools of this repo.
clpeak: