ysh329 / OpenCL-101

Learn OpenCL step by step.
123 stars 31 forks source link

[bandwidth] Bandwidth for typeN and compare with clpeak result #7

Closed ysh329 closed 6 years ago

ysh329 commented 6 years ago

Before, set max freq. for gpu and cpu using scrips in tools of this repo.

  1. Calculate bandwidth for typeN: intN, floatN, halfN;
  2. Compare with clpeak result.

clpeak:

Platform: ARM Platform
  Device: Mali-T860
    Driver version  : 1.2 (Linux ARM64)
    Compute units   : 4
    Clock frequency : 800 MHz

    Global memory bandwidth (GBPS)
      float   : 3.84
      float2  : 6.00
      float4  : 7.33
      float8  : 6.01
      float16 : 5.78

    Single-precision compute (GFLOPS)
      float   : 22.86
      float2  : 44.68
      float4  : 44.51
      float8  : 41.46
      float16 : 46.16

    half-precision compute (GFLOPS)
      half   : 22.83
      half2  : 46.46
      half4  : 93.96
      half8  : 92.44
      half16 : 69.40

    Double-precision compute (GFLOPS)
      double   : 3.60
      double2  : 3.54
      double4  : 20.92
      double8  : 20.60
      double16 : 20.35

    Integer compute (GIOPS)
      int   : 20.26
      int2  : 49.72
      int4  : 47.51
      int8  : 48.96
      int16 : 41.47

    Transfer bandwidth (GBPS)
      enqueueWriteBuffer         : 4.06
      enqueueReadBuffer          : 2.17
      enqueueMapBuffer(for read) : 2015.28
        memcpy from mapped ptr   : 2.18
      enqueueUnmap(after write)  : 5406.56
        memcpy to mapped ptr     : 2.23

    Kernel launch latency : 78.36 us
ysh329 commented 6 years ago

My bandwidth results are as below (more concrete logs're here):

half1: 5.16 GB/s
half2: 4.71 GB/s
half4: 5.14 GB/s
half8: 5.50 GB/s
half16: 4.98 GB/s
half1-A53: 2.10 GB/s
half1-A72: 3.91 GB/s

short1: 5.29 GB/s
short2: 4.71 GB/s
short4: 5.07 GB/s
short8: 5.52 GB/s
short16: 5.00GB/s
short1-A53: 2.26 GB/s
short1-A72: 4.51 GB/s

int1: 5.26 GB/s
int2: 5.49 GB/s
int4: 6.13 GB/s
int8: 5.49 GB/s
int16: 5.28 GB/s
int-a53: 2.25 GB/s
int-a72: 4.53 GB/s

float1: 4.83 GB/s
float2: 4.72 GB/s
float4: 5.39 GB/s
float8: 4.72 GB/s
float16: 4.52 GB/s
float-a53: 2.15 GB/s
float-a72: 4.04 GB/s

double1: 4.49 GB/s
double2: 6.39 GB/s
double4: 5.58 GB/s
double8: 5.40 GB/s
double16: 5.51 GB/s
double1-A53: 2.29 GB/s
double1-A72: 4.58 GB/s
ysh329 commented 6 years ago

The gap between clpeak (bandwidth is bigger than measures using my code) and my bandwidth is due to read operation only for clpeak, but my bandwidth have both read and write operations in kernel function.

clpeak

Kerel function is here.

    Global memory bandwidth (GBPS)
      float   : 3.84
      float2  : 6.00
      float4  : 7.33
      float8  : 6.01
      float16 : 5.78

my bandwidth

Kernel function is here.

float1: 4.83 GB/s
float2: 4.72 GB/s
float4: 5.39 GB/s
float8: 4.72 GB/s
float16: 4.52 GB/s
float-a53: 2.15 GB/s
float-a72: 4.04 GB/s