The 26.5 TFLOPS per GCD is an outdated number. Back in 2022 we lowered the GPU's compute frequency to 1700 MHz, bringing the peak TFLOPS from 26.5 down to 23.9. I've also updated the language to specify that the Matrix cores have a peak FLOP count 2x higher than the vector units. This is documented in the CU diagram and in the roofline profiling section.
The 26.5 TFLOPS per GCD is an outdated number. Back in 2022 we lowered the GPU's compute frequency to 1700 MHz, bringing the peak TFLOPS from 26.5 down to 23.9. I've also updated the language to specify that the Matrix cores have a peak FLOP count 2x higher than the vector units. This is documented in the CU diagram and in the roofline profiling section.