Open acollaborator opened 1 year ago
results when connected in internal PCIe slot
Platform: AMD Accelerated Parallel Processing Device: gfx1030 Driver version : 3513.0 (HSA1.1,LC) (Linux x64) Compute units : 40 Clock frequency : 2660 MHz
Global memory bandwidth (GBPS)
float : 433.01
float2 : 462.90
float4 : 479.86
float8 : 483.06
float16 : 484.81
Single-precision compute (GFLOPS)
float : 24465.54
float2 : 23491.38
float4 : 23244.73
float8 : 22507.95
float16 : 22336.47
Half-precision compute (GFLOPS)
half : 23904.53
half2 : 46377.77
half4 : 45958.41
half8 : 43198.06
half16 : 42349.71
Double-precision compute (GFLOPS)
double : 1592.18
double2 : 1592.25
double4 : 1597.43
double8 : 1587.96
double16 : 1562.70
Integer compute (GIOPS)
int : 6120.95
int2 : 5323.04
int4 : 5622.27
int8 : 5804.56
int16 : 5689.05
Integer compute Fast 24bit (GIOPS)
int : 20319.02
int2 : 19875.83
int4 : 19409.06
int8 : 19537.97
int16 : 18721.80
Transfer bandwidth (GBPS)
enqueueWriteBuffer : 19.39
enqueueReadBuffer : 19.82
enqueueWriteBuffer non-blocking : 19.01
enqueueReadBuffer non-blocking : 19.76
enqueueMapBuffer(for read) : 604925.00
memcpy from mapped ptr : 19.76
enqueueUnmap(after write) : 1193046.50
memcpy to mapped ptr : 19.95
Kernel launch latency : 7.49 us
results for AMD Radeon RX 6900 XT connected via Thunderbolt 3 in Ubuntu 22.04 / linux 5.15.0-60 clpeak version: 1.1.2
./clpeak -dn gfx1030
Platform: AMD Accelerated Parallel Processing Device: gfx1030 Driver version : 3513.0 (HSA1.1,LC) (Linux x64) Compute units : 40 Clock frequency : 2660 MHz