Function does not implement RMSNorm

JW-swansea commented 3 weeks ago

Hi, I was looking through the code and noticed something strange.

This function, is supposed to implement RMSNorm, from Zhang, Biao, and Rico Sennrich. "Root mean square layer normalization." Advances in Neural Information Processing Systems 32 (2019).

But instead of dividing by the appropriate coefficient, it multiplies.

https://github.com/meta-llama/llama/blob/8fac8befd776bc03242fe7bc2236cdb41b6c609c/llama/model.py#L52-L63

If the square of entries of the vector is already n, this makes no difference, but if it is anything else, it will make larger vectors larger and smaller vectors smaller, away from that value, opposite to intended functionality.

JW-swansea commented 3 weeks ago

Issue also reposted here in the current repository, close as appropriate.

JW-swansea commented 3 weeks ago

Problem solved, misread rsqrt as sqrt

meta-llama / llama

Function does not implement RMSNorm #1193