I noticed that you have used large amount of Scale(2.74) in your code. However, I didn't saw any descriptions or similar codes by others that can explain it. I wonder if they are some kind of feature or just I missed the related description. Can you explain it for me? I'll appreciate it.
This is the scale for scaled weight standardization in NF-ResNet (instead of batch normalization). Please refer to Sec. 4.4 and Appendix C.1 in the paper for details.
I noticed that you have used large amount of
Scale(2.74)
in your code. However, I didn't saw any descriptions or similar codes by others that can explain it. I wonder if they are some kind of feature or just I missed the related description. Can you explain it for me? I'll appreciate it.