krrishnarraj / clpeak

A tool which profiles OpenCL devices to find their peak capacities
Apache License 2.0
386 stars 109 forks source link

results for AMD Radeon VII connected via Thunderbolt 3 in Windows 11 #103

Open acollaborator opened 1 year ago

acollaborator commented 1 year ago

clpeak 1.1.2

Platform: AMD Accelerated Parallel Processing Device: gfx906 Driver version : 3516.0 (PAL,HSAIL) (Win64) Compute units : 60 Clock frequency : 1801 MHz

Global memory bandwidth (GBPS)
  float   : 822.46
  float2  : 840.00
  float4  : 840.63
  float8  : 782.67
  float16 : 692.59

Single-precision compute (GFLOPS)
  float   : 13705.64
  float2  : 13681.48
  float4  : 13649.84
  float8  : 13563.53
  float16 : 13370.20

Half-precision compute (GFLOPS)
  half   : 9133.37
  half2  : 26691.18
  half4  : 26193.26
  half8  : 25393.31
  half16 : 23530.63

Double-precision compute (GFLOPS)
  double   : 3434.87
  double2  : 3430.05
  double4  : 3416.19
  double8  : 3410.32
  double16 : 3364.18

Integer compute (GIOPS)
  int   : 4536.94
  int2  : 4494.69
  int4  : 4507.57
  int8  : 4501.87
  int16 : 4504.83

Integer compute Fast 24bit (GIOPS)
  int   : 13274.74
  int2  : 12844.57
  int4  : 12779.77
  int8  : 12491.75
  int16 : 12344.62

Transfer bandwidth (GBPS)
  enqueueWriteBuffer              : 16.78
  enqueueReadBuffer               : 17.05
  enqueueWriteBuffer non-blocking : 17.08
  enqueueReadBuffer non-blocking  : 17.12
  enqueueMapBuffer(for read)      : 383479.22
    memcpy from mapped ptr        : 17.18
  enqueueUnmap(after write)       : 1867377.12
    memcpy to mapped ptr          : 17.15

Kernel launch latency : 57.15 us