Open XiChen-97 opened 2 years ago
Hi, the spectral norm ||W||_2 is the maximal singular value of the matrix W, and we have c ||W||_2 = ||c W||_2. For the equation W = W / ||W||_2, it strictly restricts ||W||_2 = 1. We use a learnable \alpha (\alpha is clipped in [-1, 1]) for W = \alpha W / ||W||_2, which indicates ||W||_2 = |\alpha| <= 1 and also enables ||W||_2 to take values smaller than 1. This enlarges the range of W from the sphere of ||W||_2 = 1 to the ball of ||W||_2 <= 1. So it can achieve the constraint. As for Weight Normalization, it reparameterize the Frobenius norm of the matrix rather than the spectral norm, i.e. W = \alpha W / ||W||_F, and it does not restrict the norm.
Thanks for the prompt reply! @pkuxmq . So given the previous constraint ||W||_2<= \gamma Vth, and the way adopted for the spectral norm constraint: W = alpha W / ||W|| alpha [-1, 1], can we say the product value of \gamma * Vth is assumed to be <(?)= 1 (Since \gamma < 1, Vth used here is 2)?
Also, a quick question is: in models.snn_ide_fc.py class SNNIDEFCNet(nn.Module):
def __init__(self, cfg, **kwargs):
super(SNNIDEFCNet, self).__init__()
self.parse_cfg(cfg)
self.network_x = SNNFC(self.dim_in, self.dim_hidden, bias=True, dropout=self.dropout, BN=True)
self.network_s = SNNFC(self.dim_hidden, self.dim_hidden, bias=False, dropout=self.dropout, BN=False)
if self.leaky == None:
self.snn_func = SNNIFFunc(self.network_s, self.network_x, vth=self.vth)
else:
self.snn_func = SNNLIFFunc(self.network_s, self.network_x, vth=self.vth, leaky=self.leaky)
self.snn_func_copy = copy.deepcopy(self.snn_func)
**self.network_s._wnorm(norm_range=1.)**
Why the weight spectral norm is only called once in the init func? Because from my understanding, the weight is supposed to be constrained every time after it is updated, which means every epoch. Thanks!
Hi, in the implementation we take V_th=2 and \alpha in [-1, 1] so that ||W||_2 <= 1 = 0.5 * Vth and the assumption is satisfied. As for the code implementation of the norm restriction, we refer to the PyTorch implementation of spectral normalization and weight normalization, which is only called once and will re-compute the weight matrix during each inference. The detailed implementation is in 'modules/optimizations.py'.
Hi there @pkuxmq , thanks for your great impressive work!
On reading the Appendix F.1 part, I'm a little bit confused about the restriction on ||W||.
Since the proof has been given that ||W||<= \gamma Vth so that the average firing rate could converge. From [1], I know that (1) restricts ||W|| <= 1. W = W / ||W|| (1) However, after adding a learnable parameter \alpha (2), where alpha is clipped to [-c, c], W = alpha W / ||W|| (2) My questions are: 1) Can we still achieve the constraint ||W||<= \gamma * Vth by using (2)? 2) If yes, could you please give some hints on how to prove the constraint still exists by using (2)? 3) Also, from [2], the weight normalization is similar to the way you adopt, but it makes ||W||=alpha, is it correlated to (2)?
[1] Miyato, Takeru, et al. "Spectral normalization for generative adversarial networks." arXiv preprint arXiv:1802.05957 (2018). [2] Salimans, Tim, and Durk P. Kingma. "Weight normalization: A simple reparameterization to accelerate training of deep neural networks." Advances in neural information processing systems 29 (2016).
Please correct me if I have any misunderstanding here, thanks in advance for your response!! @pkuxmq