Analyze the inference of Large Language Models (LLMs). Analyze aspects like computation, storage, transmission, and hardware roofline model in a user-friendly interface.
MIT License
275
stars
31
forks
source link
I am not sure why OPs for softmax is "softmax_OPs = bsz * n_heads * seqlen * 1 * 5" #11
Why multiply 5?