Closed xiaoxin83121 closed 1 year ago
Hi, I have seen LLAMA using RMSNorm in a pre-norm manner. However, I have read your paper a long time ago. And I realize that the forward propagations are nearly the same. Am I mistaking something?
Hi, I have seen LLAMA using RMSNorm in a pre-norm manner. However, I have read your paper a long time ago. And I realize that the forward propagations are nearly the same. Am I mistaking something?