krrishnarraj / clpeak

A tool which profiles OpenCL devices to find their peak capacities
Apache License 2.0
386 stars 109 forks source link

results for AMD Radeon RX 6900 XT connected via Thunderbolt 3 in Ubuntu 22.04 / linux 5.15.0-60 #102

Open acollaborator opened 1 year ago

acollaborator commented 1 year ago

results for AMD Radeon RX 6900 XT connected via Thunderbolt 3 in Ubuntu 22.04 / linux 5.15.0-60 clpeak version: 1.1.2

./clpeak -dn gfx1030

Platform: AMD Accelerated Parallel Processing Device: gfx1030 Driver version : 3513.0 (HSA1.1,LC) (Linux x64) Compute units : 40 Clock frequency : 2660 MHz

Global memory bandwidth (GBPS)
  float   : 423.79
  float2  : 441.29
  float4  : 446.95
  float8  : 456.39
  float16 : 477.69

Single-precision compute (GFLOPS)
  float   : 24483.21
  float2  : 22691.08
  float4  : 23121.26
  float8  : 22931.97
  float16 : 22159.95

Half-precision compute (GFLOPS)
  half   : 23514.74
  half2  : 46242.11
  half4  : 45222.88
  half8  : 42697.05
  half16 : 42386.64

Double-precision compute (GFLOPS)
  double   : 1581.64
  double2  : 1581.80
  double4  : 1578.37
  double8  : 1565.53
  double16 : 1536.39

Integer compute (GIOPS)
  int   : 6056.93
  int2  : 5219.01
  int4  : 5554.51
  int8  : 5781.25
  int16 : 5647.09

Integer compute Fast 24bit (GIOPS)
  int   : 19833.61
  int2  : 19475.66
  int4  : 19342.05
  int8  : 18979.65
  int16 : 19366.18

Transfer bandwidth (GBPS)
  enqueueWriteBuffer              : 19.48
  enqueueReadBuffer               : 20.26
  enqueueWriteBuffer non-blocking : 20.46
  enqueueReadBuffer non-blocking  : 20.72
  enqueueMapBuffer(for read)      : 517465.91
    memcpy from mapped ptr        : 20.57
  enqueueUnmap(after write)       : 1022611.31
    memcpy to mapped ptr          : 20.02

Kernel launch latency : 20.53 us
acollaborator commented 1 year ago

results when connected in internal PCIe slot

Platform: AMD Accelerated Parallel Processing Device: gfx1030 Driver version : 3513.0 (HSA1.1,LC) (Linux x64) Compute units : 40 Clock frequency : 2660 MHz

Global memory bandwidth (GBPS)
  float   : 433.01
  float2  : 462.90
  float4  : 479.86
  float8  : 483.06
  float16 : 484.81

Single-precision compute (GFLOPS)
  float   : 24465.54
  float2  : 23491.38
  float4  : 23244.73
  float8  : 22507.95
  float16 : 22336.47

Half-precision compute (GFLOPS)
  half   : 23904.53
  half2  : 46377.77
  half4  : 45958.41
  half8  : 43198.06
  half16 : 42349.71

Double-precision compute (GFLOPS)
  double   : 1592.18
  double2  : 1592.25
  double4  : 1597.43
  double8  : 1587.96
  double16 : 1562.70

Integer compute (GIOPS)
  int   : 6120.95
  int2  : 5323.04
  int4  : 5622.27
  int8  : 5804.56
  int16 : 5689.05

Integer compute Fast 24bit (GIOPS)
  int   : 20319.02
  int2  : 19875.83
  int4  : 19409.06
  int8  : 19537.97
  int16 : 18721.80

Transfer bandwidth (GBPS)
  enqueueWriteBuffer              : 19.39
  enqueueReadBuffer               : 19.82
  enqueueWriteBuffer non-blocking : 19.01
  enqueueReadBuffer non-blocking  : 19.76
  enqueueMapBuffer(for read)      : 604925.00
    memcpy from mapped ptr        : 19.76
  enqueueUnmap(after write)       : 1193046.50
    memcpy to mapped ptr          : 19.95

Kernel launch latency : 7.49 us