Open BA8F0D39 opened 1 year ago
There is no native half-precision support on NVIDIA Ampere (except for A100) or Ada GPU. Their half-precision performance is the same as single-precision.
@moyang RTX 3090 has native FP16 support in tensor cores https://www.nvidia.com/content/PDF/nvidia-ampere-ga-102-gpu-architecture-whitepaper-v2.pdf
512 FP16 FMA per SM 128 FP16 FMA per Tensor core
RTX 3090 has 82 SM and 328 Tensor cores
@BA8F0D39 This seems to be a problem with NVIDIA's OpenCL implementation. When querying device capabilities by apps (like clpeak), it reports "no half-precision support". I observed the same issue with other benchmarks, like SiSoftware Sandra. .
clpeak version: 1.1.2