Closed choltz95 closed 6 years ago
It is not clear why you use batch_dot for computing covariance matrix, but it seems like you forget to divide by the number of samples in a batch.
You can check my implementation https://github.com/AliaksandrSiarohin/gan/blob/4a3253c1f077ce97a806d59f86f2c7b961fe5a56/conditional_layers.py#L606. It is a bit messy, but it seem like it works.
Slow svd decomposition is well known problem of tensorflow. For example check https://github.com/tensorflow/tensorflow/issues/13222. You can try to run it on cpu:
with tf.device('cpu'): Lambda, D = tf.self_adjoint_eig(covar)
But it will still be slow.
Thank you for your help. I misunderstood batch_dot() in keras. I was wondering if you managed to replicate the performance of the paper with your implementation? Even with scaling & group normalization I have not been able to achieve superior performance compared to standard batch norm.
The improvement is very marginal. You will not able to see it unless you do an average over 5 runs. For cifar10 classification I only try to experiment with whitening based on cholesky decomposition. It gives the same marginal improvement e.g. 7.3(whitening) vs 7.0(batch-norm) for res32.
I would be caution to use self_adjoint_eig since there is an error when using tf eig https://github.com/tensorflow/tensorflow/issues/16115
And it makes sense, if the cost function is not set, there are no derivative respect to eig value and vector.
Hi, I reproduced this layer in Keras, but i am getting the wrong result. I though that my implementation looks fine...does anyone see any obvious issues? Additionally, the eigenvalue decomposition is very slow. Do you have advice to speed it up? The input to the layer is a tensor of dimension (batch_size height width channels). Here are the equations: