Open KatyushaScarlet opened 7 months ago
clpeak version: 1.1.2
Platform: Moore Threads OpenCL Device: MUSA GEN1-104 Driver version : 20230926_develop-36-g6d1e11a670da-dirty release (Linux x64) Compute units : 32 Clock frequency : 1800 MHz
Global memory bandwidth (GBPS) float : 269.70 float2 : 373.22 float4 : 381.20 float8 : 389.35 float16 : 397.03 Single-precision compute (GFLOPS) float : 14190.62 float2 : 13320.74 float4 : 13418.10 float8 : 13379.27 float16 : 13307.41 Half-precision compute (GFLOPS) half : 13300.11 half2 : 13353.18 half4 : 13422.47 half8 : 13452.69 half16 : 13320.31 Double-precision compute (GFLOPS) double : 35.60 double2 : 30.08 double4 : 22.52 double8 : 13.69 double16 : 6.99 Integer compute (GIOPS) int : 2095.66 int2 : 2091.45 int4 : 2094.30 int8 : 2095.51 int16 : 2096.50 Integer compute Fast 24bit (GIOPS) int : 2095.58 int2 : 2092.48 int4 : 2094.36 int8 : 2094.98 int16 : 2097.07 Transfer bandwidth (GBPS) enqueueWriteBuffer : 5.52 enqueueReadBuffer : 4.25 enqueueWriteBuffer non-blocking : 5.61 enqueueReadBuffer non-blocking : 4.15 enqueueMapBuffer(for read) : 5263.44 memcpy from mapped ptr : Socket error Event: 32 Error: 10053. Device: MUSA GEN1-104 Driver version : 20230926_develop-36-g6d1e11a670da-dirty release (Linux x64) Compute units : 32 Clock frequency : 1800 MHz Global memory bandwidth (GBPS) float : 269.70 float2 : 373.22 float4 : 381.20 float8 : 389.35 float16 : 397.03 Single-precision compute (GFLOPS) float : 14190.62 float2 : 13320.74 float4 : 13418.10 float8 : 13379.27 float16 : 13307.41 Half-precision compute (GFLOPS) half : 13300.11 half2 : 13353.18 half4 : 13422.47 half8 : 13452.69 half16 : 13320.31 Double-precision compute (GFLOPS) double : 35.60 double2 : 30.08 double4 : 22.52 double8 : 13.69 double16 : 6.99 Integer compute (GIOPS) int : 2095.66 int2 : 2091.45 int4 : 2094.30 int8 : 2095.51 int16 : 2096.50 Integer compute Fast 24bit (GIOPS) int : 2095.58 int2 : 2092.48 int4 : 2094.36 int8 : 2094.98 int16 : 2097.07 Transfer bandwidth (GBPS) enqueueWriteBuffer : 5.52 enqueueReadBuffer : 4.25 enqueueWriteBuffer non-blocking : 5.61 enqueueReadBuffer non-blocking : 4.15 enqueueMapBuffer(for read) : 5263.44 memcpy from mapped ptr : 0.02 enqueueUnmap(after write) : 6515.42 memcpy to mapped ptr : 5.60 Kernel launch latency : 31.03 us
Impressive. Raise a PR with these details
clpeak version: 1.1.2
Platform: Moore Threads OpenCL Device: MUSA GEN1-104 Driver version : 20230926_develop-36-g6d1e11a670da-dirty release (Linux x64) Compute units : 32 Clock frequency : 1800 MHz