krrishnarraj / clpeak

A tool which profiles OpenCL devices to find their peak capacities
Apache License 2.0
386 stars 109 forks source link

results for AMD Radeon 6900XT connected via Thunderbolt 3 in Windows 11 #101

Open acollaborator opened 1 year ago

acollaborator commented 1 year ago

results for AMD Radeon 6900XT connected via Thunderbolt 3 in Windows

Platform: AMD Accelerated Parallel Processing Device: gfx1030 Driver version : 3516.0 (PAL,LC) (Win64) Compute units : 40 Clock frequency : 2015 MHz

Global memory bandwidth (GBPS)
  float   : 446.26
  float2  : 479.50
  float4  : 489.94
  float8  : 497.28
  float16 : 500.41

Single-precision compute (GFLOPS)
  float   : 25405.74
  float2  : 24505.80
  float4  : 24389.60
  float8  : 23754.91
  float16 : 23152.01

Half-precision compute (GFLOPS)
  half   : 24808.24
  half2  : 49398.67
  half4  : 48301.48
  half8  : 46714.04
  half16 : 45684.66

Double-precision compute (GFLOPS)
  double   : 1609.84
  double2  : 1608.84
  double4  : 1605.60
  double8  : 1598.52
  double16 : 1585.71

Integer compute (GIOPS)
  int   : 5089.61
  int2  : 5043.19
  int4  : 5027.10
  int8  : 5001.98
  int16 : 4962.62

Integer compute Fast 24bit (GIOPS)
  int   : 21753.65
  int2  : 21389.28
  int4  : 21226.49
  int8  : 20617.82
  int16 : 18573.64

Transfer bandwidth (GBPS)
  enqueueWriteBuffer              : 16.95
  enqueueReadBuffer               : 17.03
  enqueueWriteBuffer non-blocking : 17.14
  enqueueReadBuffer non-blocking  : 17.20
  enqueueMapBuffer(for read)      : 313501.28
    memcpy from mapped ptr        : 17.20
  enqueueUnmap(after write)       : 1651910.50
    memcpy to mapped ptr          : 17.20

Kernel launch latency : 54.70 us