@takerum
I have a question about a batch normalization layer with the spectral normlization.
The following discussion ignores biases parameters as they do not affect the Lipschitz constant.
Generally speaking, a batch normalization is regarded as a linear transformation with a diagonal matrix W with W_{i, i} = w_i = gamma_i / sqrt(sigma_i ^ 2 + epsilon) > 0, gamma_i, sigma_i ^ 2 and epsilon are corresponding to a scaling constant, a running average of variance of input x_i and small constant respectively.
Hence, the spectral norm of this diagonal matrix W is its maximum diagonal element, e.g. max(w_i).
Naively adapting SN to a batch normalization layer, W' = W / max(w_i) is obtained as a spectrally normalized batch normalization matrix, but W' seems to fail to batch-normalize inputs.
What is the most reasonable way to adapt SN to a batch normalization layer in your framework?
@takerum I have a question about a batch normalization layer with the spectral normlization.
The following discussion ignores biases parameters as they do not affect the Lipschitz constant. Generally speaking, a batch normalization is regarded as a linear transformation with a diagonal matrix
W
withW_{i, i} = w_i = gamma_i / sqrt(sigma_i ^ 2 + epsilon) > 0
,gamma_i
,sigma_i ^ 2
andepsilon
are corresponding to a scaling constant, a running average of variance of input x_i and small constant respectively. Hence, the spectral norm of this diagonal matrixW
is its maximum diagonal element, e.g.max(w_i)
.Naively adapting SN to a batch normalization layer,
W' = W / max(w_i)
is obtained as a spectrally normalized batch normalization matrix, butW'
seems to fail to batch-normalize inputs. What is the most reasonable way to adapt SN to a batch normalization layer in your framework?