Open pdradx opened 7 months ago
The fix of normalization Rnm is totally wrong. The added max value in clam needed because of wrong placement of abs() operation. More thorough explanation I put here: https://github.com/microsoft/torchscale/commit/fdd8838a756c7c435d7f8a1e4303e150dfac7442#commitcomment-134758047
Commented commit brokes the only place where it was right!
But if it is right - the paper needs fix in corresponding section, describing normalization tricks.
The fix of normalization Rnm is totally wrong. The added max value in clam needed because of wrong placement of abs() operation. More thorough explanation I put here: https://github.com/microsoft/torchscale/commit/fdd8838a756c7c435d7f8a1e4303e150dfac7442#commitcomment-134758047
Commented commit brokes the only place where it was right!