nihui / vkpeak

A tool which profiles Vulkan devices to find their peak capacities
MIT License
99 stars 3 forks source link

Int8 & Int64 support? #1

Open oscarbg opened 3 years ago

oscarbg commented 3 years ago

Hi, nice benchmark! below my Titan V and RX Vega Win results.. AFAIK Vulkan spec supports also int8 (via VK_KHR_shader_float16_int8 shaderInt8) and int64 (shaderInt64).. any plan on support benchmarking int8/64 throughput? thanks..

Results:

device = NVIDIA TITAN V

fp32-scalar = 17230.91 GFLOPS fp32-vec4 = 16898.01 GFLOPS

fp16-scalar = 16781.96 GFLOPS fp16-vec4 = 32568.21 GFLOPS

fp64-scalar = 7664.02 GFLOPS fp64-vec4 = 7677.14 GFLOPS

int32-scalar = 14464.71 GIOPS int32-vec4 = 14755.26 GIOPS

int16-scalar = 9727.97 GIOPS int16-vec4 = 11768.93 GIOPS

device = Radeon RX Vega

fp32-scalar = 11453.46 GFLOPS fp32-vec4 = 11010.15 GFLOPS

fp16-scalar = 10388.36 GFLOPS fp16-vec4 = 17744.94 GFLOPS

fp64-scalar = 686.59 GFLOPS fp64-vec4 = 686.31 GFLOPS

int32-scalar = 2188.62 GIOPS int32-vec4 = 2170.05 GIOPS

int16-scalar = 10013.59 GIOPS int16-vec4 = 9885.89 GIOPS

oscarbg commented 3 years ago

leaving results on my Vega on MacOS: device = AMD Radeon RX Vega 64

fp32-scalar = 11544.38 GFLOPS fp32-vec4 = 10986.99 GFLOPS

fp16-scalar = 10465.67 GFLOPS fp16-vec4 = 21179.84 GFLOPS

fp64-scalar = 0.00 GFLOPS fp64-vec4 = 0.00 GFLOPS

int32-scalar = 2207.93 GIOPS int32-vec4 = 2199.83 GIOPS

int16-scalar = 10626.61 GIOPS int16-vec4 = 18983.10 GIOPS

comments:

so briefly on Macos 16 bit precision int and float vector4 gets 2x faster than scalar (AMD rapid packed math :-))..