Int8 & Int64 support? - Githubissues

nihui / vkpeak

A tool which profiles Vulkan devices to find their peak capacities

MIT License

99 stars 3 forks source link

Hi, nice benchmark! below my Titan V and RX Vega Win results.. AFAIK Vulkan spec supports also int8 (via VK_KHR_shader_float16_int8 shaderInt8) and int64 (shaderInt64).. any plan on support benchmarking int8/64 throughput? thanks..

Results:

device = NVIDIA TITAN V

fp32-scalar = 17230.91 GFLOPS fp32-vec4 = 16898.01 GFLOPS

fp16-scalar = 16781.96 GFLOPS fp16-vec4 = 32568.21 GFLOPS

fp64-scalar = 7664.02 GFLOPS fp64-vec4 = 7677.14 GFLOPS

int32-scalar = 14464.71 GIOPS int32-vec4 = 14755.26 GIOPS

int16-scalar = 9727.97 GIOPS int16-vec4 = 11768.93 GIOPS

device = Radeon RX Vega

fp32-scalar = 11453.46 GFLOPS fp32-vec4 = 11010.15 GFLOPS

fp16-scalar = 10388.36 GFLOPS fp16-vec4 = 17744.94 GFLOPS

fp64-scalar = 686.59 GFLOPS fp64-vec4 = 686.31 GFLOPS

int32-scalar = 2188.62 GIOPS int32-vec4 = 2170.05 GIOPS

int16-scalar = 10013.59 GIOPS int16-vec4 = 9885.89 GIOPS

leaving results on my Vega on MacOS: device = AMD Radeon RX Vega 64

fp32-scalar = 11544.38 GFLOPS fp32-vec4 = 10986.99 GFLOPS

fp16-scalar = 10465.67 GFLOPS fp16-vec4 = 21179.84 GFLOPS

fp64-scalar = 0.00 GFLOPS fp64-vec4 = 0.00 GFLOPS

int32-scalar = 2207.93 GIOPS int32-vec4 = 2199.83 GIOPS

int16-scalar = 10626.61 GIOPS int16-vec4 = 18983.10 GIOPS

comments:

surprise: under Metal fp16-vec4 is 2x faster than scalar and faster than on Windows (21 Tflops vs 17,7 Tflops)
also int16-vec4 near 2x faster than scalar on MacOS and seems much faster than on Windows (19 Tflops vs 10Tflops on Win)
Doubles not supported on Metal/Macos..

so briefly on Macos 16 bit precision int and float vector4 gets 2x faster than scalar (AMD rapid packed math :-))..

nihui / vkpeak

Int8 & Int64 support? #1