I am not sure why OPs for softmax is "softmax_OPs = bsz * n_heads * seqlen * 1 * 5"

hahnyuan / LLM-Viewer

Analyze the inference of Large Language Models (LLMs). Analyze aspects like computation, storage, transmission, and hardware roofline model in a user-friendly interface.

MIT License

275 stars 31 forks source link

I am not sure why OPs for softmax is "softmax_OPs = bsz * n_heads * seqlen * 1 * 5" #11

Closed erxiong0 closed 6 days ago

erxiong0 commented 1 month ago

Why multiply 5?

hahnyuan commented 1 month ago

This is because the softmax operation takes five steps:

max_x=max(x)
x=x-max_x
x_exp=exp(x)
sum_x_exp=sum(x_exp)
y=x_exp/sum(x_exp)