mli / transformers-benchmarks

real Transformer TeraFLOPS on various GPUs
Apache License 2.0
876 stars 109 forks source link

Actual value seems to be the same as theoretical value (instead of much smaller than it) #6

Open fzyzcjy opened 4 weeks ago

fzyzcjy commented 4 weeks ago

Hi thank you for the great benchmark as well as the videos explaining papers!

Currently the table/explanations says that actual matrix multiplication value is much smaller than theoretical one. But after asking in https://forums.developer.nvidia.com/t/why-is-matrix-multiplication-quite-slow-and-all-hardware-seems-to-be-only-half-used/312013, Curefab explains that it is indeed almost the same. Thus I create this tiny issue as a little reference.