Batch normalization with SN

@takerum I have a question about a batch normalization layer with the spectral normlization.

The following discussion ignores biases parameters as they do not affect the Lipschitz constant. Generally speaking, a batch normalization is regarded as a linear transformation with a diagonal matrix W with W_{i, i} = w_i = gamma_i / sqrt(sigma_i ^ 2 + epsilon) > 0, gamma_i, sigma_i ^ 2 and epsilon are corresponding to a scaling constant, a running average of variance of input x_i and small constant respectively. Hence, the spectral norm of this diagonal matrix W is its maximum diagonal element, e.g. max(w_i).

Naively adapting SN to a batch normalization layer, W' = W / max(w_i) is obtained as a spectrally normalized batch normalization matrix, but W' seems to fail to batch-normalize inputs. What is the most reasonable way to adapt SN to a batch normalization layer in your framework?

pfnet-research / sngan_projection

Batch normalization with SN #38