krrishnarraj / clpeak

A tool which profiles OpenCL devices to find their peak capacities
Apache License 2.0
396 stars 111 forks source link

Questions about Global Memory Bandwidth #40

Closed fchen7i closed 7 years ago

fchen7i commented 7 years ago

Hello,

I have one question about global memory bandwidth. I find that global memory bandwidth may decrease for float8 and float16 in most of devices. I hope to know the reason why global memory bandwidth decreases. The following is my log from my MacPro.

Platform: Apple
  Device: Intel(R) Iris(TM) Graphics 6100
    Driver version  : 1.2(Apr 11 2017 16:38:15) (Macintosh)
    Compute units   : 48
    Clock frequency : 1050 MHz

    Global memory bandwidth (GBPS)
      float   : 13.71
      float2  : 14.28
      float4  : 14.75
      float8  : 7.58
      float16 : 3.97

    Single-precision compute (GFLOPS)
      float   : 631.50
      float2  : 641.30
      float4  : 640.04
      float8  : 638.97
      float16 : 634.22

    No half precision support! Skipped

    No double precision support! Skipped

    Transfer bandwidth (GBPS)
      enqueueWriteBuffer         : 4.81
      enqueueReadBuffer          : 5.41
      enqueueMapBuffer(for read) : 447.69
        memcpy from mapped ptr   : 4.64
      enqueueUnmap(after write)  : 6376.14
        memcpy to mapped ptr     : 5.27

    Kernel launch latency : 71.60 us

Thanks a lot.

krrishnarraj commented 7 years ago

While I don't know the exact reason, I can only speculate. This might be to do with memory bus width. Spec says, it has 128 bit wide bus (aka 4 floats). I guess when it loads float8/float16, it is not fitting the cache line & trashing heavily

fchen7i commented 7 years ago

Thanks. I will close the issue.