microsoft / torchscale

Foundation Architecture for (M)LLMs
https://aka.ms/GeneralAI
MIT License
2.98k stars 201 forks source link

Wrong Rnm Normalization. #86

Open pdradx opened 7 months ago

pdradx commented 7 months ago

The fix of normalization Rnm is totally wrong. The added max value in clam needed because of wrong placement of abs() operation. More thorough explanation I put here: https://github.com/microsoft/torchscale/commit/fdd8838a756c7c435d7f8a1e4303e150dfac7442#commitcomment-134758047

Commented commit brokes the only place where it was right!

pdradx commented 7 months ago

But if it is right - the paper needs fix in corresponding section, describing normalization tricks.