Open oscarbg opened 1 year ago
My results with 6.2.1 kernel for Arc A770:
Platform: Intel(R) OpenCL HD Graphics
Device: Intel(R) Graphics [0x56a0]
Driver version : 22.49.25018.24 (Linux x64)
Compute units : 512
Clock frequency : 2400 MHz
Global memory bandwidth (GBPS)
float : 397.92
float2 : 403.43
float4 : 407.01
float8 : 417.52
float16 : 421.01
Single-precision compute (GFLOPS)
float : 13018.01
float2 : 11137.58
float4 : 10403.04
float8 : 10026.99
float16 : 9701.60
Half-precision compute (GFLOPS)
half : 19552.90
half2 : 19493.52
half4 : 19526.21
half8 : 19459.81
half16 : 19340.77
No double precision support! Skipped
Integer compute (GIOPS)
int : 4765.67
int2 : 4773.43
int4 : 4789.65
int8 : 4644.51
int16 : 5455.67
Integer compute Fast 24bit (GIOPS)
int : 4755.75
int2 : 4768.87
int4 : 4786.68
int8 : 4642.19
int16 : 5455.34
Transfer bandwidth (GBPS)
enqueueWriteBuffer : 2.64
enqueueReadBuffer : 2.43
enqueueWriteBuffer non-blocking : 2.85
enqueueReadBuffer non-blocking : 2.63
enqueueMapBuffer(for read) : 2.83
memcpy from mapped ptr : 14.38
enqueueUnmap(after write) : 2.91
memcpy to mapped ptr : 14.01
Kernel launch latency : 36.30 us
@al42and nice.. thanks for sharing.. would be nice to have Windows results also to see they not diverge much if you have Windows installed also..
Don't have Windows :(
Kernel latency seems worse on Windows.
Platform: Intel(R) OpenCL HD
Graphics Device: Intel(R) Arc(TM) A770
Graphics Driver version : 31.0.101.4255 (Win64)
Compute units : 512
Clock frequency : 2400 MHz
Global memory bandwidth (GBPS)
float : 396.30
float2 : 403.57
float4 : 409.15
float8 : 419.49
float16 : 423.01
Single-precision compute (GFLOPS)
float : 13346.34
float2 : 11416.61
float4 : 10663.24
float8 : 10299.98
float16 : 9975.71
Half-precision compute (GFLOPS)
half : 20033.96
half2 : 19979.07
half4 : 19969.53
half8 : 19922.98
half16 : 19841.67
No double precision support! Skipped
Integer compute (GIOPS)
int : 4830.21
int2 : 4857.29
int4 : 4846.14
int8 : 4724.30
int16 : 5532.68
Integer compute Fast 24bit (GIOPS)
int : 4824.44
int2 : 4850.69
int4 : 4829.88
int8 : 4694.66
int16 : 5510.71
Transfer bandwidth (GBPS)
enqueueWriteBuffer : 11.21
enqueueReadBuffer : 5.33
enqueueWriteBuffer non-blocking : 15.99
enqueueReadBuffer non-blocking : 6.21
enqueueMapBuffer(for read) : 19.14
memcpy from mapped ptr : 19.38
enqueueUnmap(after write) : 17.15
memcpy to mapped ptr : 19.76
Kernel launch latency : 78.90 us
kernel 5.17.0-1020-oem
and intel-i915-dkms 1.23.3.19.230122.18.5.17.0.1020+i38-1
but bandwidth capped with PCI 3.0
Platform: Intel(R) OpenCL HD Graphics
Device: Intel(R) Arc(TM) A770 Graphics
Driver version : 23.05.25593.18 (Linux x64)
Compute units : 512
Clock frequency : 2400 MHz
Global memory bandwidth (GBPS)
float : 399.42
float2 : 403.78
float4 : 408.53
float8 : 418.51
float16 : 422.97
Single-precision compute (GFLOPS)
float : 13000.09
float2 : 11134.71
float4 : 10402.13
float8 : 10024.48
float16 : 9706.12
Half-precision compute (GFLOPS)
half : 19552.26
half2 : 19500.15
half4 : 19505.83
half8 : 19463.29
half16 : 19341.72
No double precision support! Skipped
Integer compute (GIOPS)
int : 4311.91
int2 : 4322.29
int4 : 4339.57
int8 : 4212.78
int16 : 4920.77
Integer compute Fast 24bit (GIOPS)
int : 4307.33
int2 : 4327.73
int4 : 4341.63
int8 : 4203.23
int16 : 4906.83
Transfer bandwidth (GBPS)
enqueueWriteBuffer : 9.47
enqueueReadBuffer : 4.50
enqueueWriteBuffer non-blocking : 11.07
enqueueReadBuffer non-blocking : 4.86
enqueueMapBuffer(for read) : 10.10
memcpy from mapped ptr : 4.80
enqueueUnmap(after write) : 11.38
memcpy to mapped ptr : 15.45
Kernel launch latency : 9.05 us
Hi,
Title says it all..
Wanting to see results of new Nv 40x0 series, Amd rdna3 and intel dg2..
hope people with needed hardware can submit them..
Thanks..