Open moyang opened 1 year ago
Platform: NVIDIA CUDA Device: NVIDIA GeForce RTX 4090 Driver version : 531.61 (Win64) Compute units : 128 Clock frequency : 2520 MHz Global memory bandwidth (GBPS) float : 866.65 float2 : 888.99 float4 : 909.81 float8 : 920.69 float16 : 921.32 Single-precision compute (GFLOPS) float : 71356.09 float2 : 75607.30 float4 : 76967.14 float8 : 71584.66 float16 : 70986.91 No half precision support! Skipped Double-precision compute (GFLOPS) double : 1289.16 double2 : 1310.90 double4 : 1364.63 double8 : 1311.27 double16 : 1356.09 Integer compute (GIOPS) int : 40810.12 int2 : 35957.76 int4 : 35848.03 int8 : 35623.48 int16 : 35670.32 Integer compute Fast 24bit (GIOPS) int : 36497.91 int2 : 35032.96 int4 : 35321.97 int8 : 35034.14 int16 : 35219.38 Transfer bandwidth (GBPS) enqueueWriteBuffer : 20.93 enqueueReadBuffer : 20.06 enqueueWriteBuffer non-blocking : 20.93 enqueueReadBuffer non-blocking : 20.06 enqueueMapBuffer(for read) : 10.78 memcpy from mapped ptr : 28.55 enqueueUnmap(after write) : 26.87 memcpy to mapped ptr : 28.09 Kernel launch latency : 8.36 us