Closed kstant0725 closed 7 years ago
the algorithm is in section 3.2 of An extended collection of matrix derivative results for forward and reverse mode algorithmic differentiation
We are currently working on this internally and I've heard it may be close. @rmlarsen knows more.
@aselle @rmlarsen Any update on this?
As mentioned, this is underway internally. I believe both the person working on it and I are just back from vacation. I assume this will be available within the coming month.
Hi @rmlarsen, I am looking through the rc-v1.0
and I don't see the SvdGrad
op registered here. Is there somewhere else I should look for it?
Yes, this would help experiment with spectral methods. @rmlarsen @aselle @yaroslavvb Is this still in active development ?
Is there any update on this? It would be super useful for our work !
@rmlarsen what is the update on this ?
There is a current implementation already but it is not yet complete. I was busy with other projects but I will try to come back to this next week.
Hello, Any news on the features? Would be very helpful to me too.
Hello, I was also wondering if there was any progress on this?
Would be great to have this functionality.
def svd(A, full_matrices=False, compute_uv=True, name=None):
# since dA = dUSVt + UdSVt + USdVt
# we can simply recompute each matrix using A = USVt
# while blocking gradients to the original op.
_, M, N = A.get_shape().as_list()
P = min(M, N)
S0, U0, V0 = map(tf.stop_gradient, tf.svd(A, full_matrices=True, name=name))
Ui, Vti = map(tf.matrix_inverse, [U0, tf.transpose(V0, (0, 2, 1))])
# A = USVt
# S = UiAVti
S = tf.matmul(Ui, tf.matmul(A, Vti))
S = tf.matrix_diag_part(S)
if not compute_uv:
return S
Si = tf.pad(tf.matrix_diag(1/S0), [[0,0], [0,N-P], [0,M-P]])
# U = AVtiSi
U = tf.matmul(A, tf.matmul(Vti, Si))
U = U if full_matrices else U[:, :M, :P]
# Vt = SiUiA
V = tf.transpose(tf.matmul(Si, tf.matmul(Ui, A)), (0, 2, 1))
V = V if full_matrices else V[:, :N, :P]
return S, U, V
Hi, @aselle @rmlarsen Any news?
Hi, I have composed one gradient function based on Matrix-backpropagation paper. Hope it helps.
def matrix_symmetric(x):
return (x + tf.transpose(x, [0,2,1])) / 2
def get_eigen_K(x, square=False):
"""
Get K = 1 / (sigma_i - sigma_j) for i != j, 0 otherwise
Parameters
----------
x : tf.Tensor with shape as [..., dim,]
Returns
-------
"""
if square:
x = tf.square(x)
res = tf.expand_dims(x, 1) - tf.expand_dims(x, 2)
res += tf.eye(tf.shape(res)[1])
res = 1 / res
res -= tf.eye(tf.shape(res)[1])
# Keep the results clean
res = tf.where(tf.is_nan(res), tf.zeros_like(res), res)
res = tf.where(tf.is_inf(res), tf.zeros_like(res), res)
return res
@tf.RegisterGradient('Svd')
def gradient_svd(op, grad_s, grad_u, grad_v):
"""
Define the gradient for SVD
References
Ionescu, C., et al, Matrix Backpropagation for Deep Networks with Structured Layers
Parameters
----------
op
grad_s
grad_u
grad_v
Returns
-------
"""
s, u, v = op.outputs
v_t = tf.transpose(v, [0,2,1])
with tf.name_scope('K'):
K = get_eigen_K(s, True)
inner = matrix_symmetric(K * tf.matmul(v_t, grad_v))
# Create the shape accordingly.
u_shape = u.get_shape()[1].value
v_shape = v.get_shape()[1].value
# Recover the complete S matrices and its gradient
eye_mat = tf.eye(v_shape, u_shape)
realS = tf.matmul(tf.reshape(tf.matrix_diag(s), [-1, v_shape]), eye_mat)
realS = tf.transpose(tf.reshape(realS, [-1, v_shape, u_shape]), [0, 2, 1])
real_grad_S = tf.matmul(tf.reshape(tf.matrix_diag(grad_s), [-1, v_shape]), eye_mat)
real_grad_S = tf.transpose(tf.reshape(real_grad_S, [-1, v_shape, u_shape]), [0, 2, 1])
dxdz = tf.matmul(u, tf.matmul(2 * tf.matmul(realS, inner) + real_grad_S, v_t))
return dxdz
@kcyu2014 Why don't you make a PR?
@kcyu2014 Thx for the code, but it is missing the get_eigen_K
and matrix_symmetric
implementations. Could you post them?
@albertpumarola Sorry I forgot it and now its updated :)
+1 would be very useful :) @rmlarsen
Yes, please add this feature, super helpful for matrix nuclear norm. @rmlarsen
Have you test the code @kcyu2014 contribute? Is it work? @albertpumarola
I tried it and it didn't work for me :/ Can find logs later if it's helpful for people
The implementation by @kcyu2014 does not have gradients for U, only for S and V (those seem to agree with numerical gradients though).
I need this feature badly. Could someone get it done fast?
Hi, any update about this? I tried the code by @kcyu2014 but it didn't work properly unfortunately.
@rmlarsen. Any update?
Sorry for the lack of progress on this. I will try to set aside a few days to get this in now. Especially now that we have GPU support for all the linear algebra ops (minus complex SVD), this is a gaping hole.
here is an implementation that should work for square matrices.
@psycharo seems to work but sometimes the loss goes to NaN when using svd in the loss (nuclear norm), but that might be the problem of my architecture not the SVD backprop code
@hicham-eyeem -- TensorFlow SVD has some bugs that cause NaNs sometimes -- https://github.com/tensorflow/tensorflow/issues/9234 , you could double check if this is fixed using numpy version
@yaroslavvb ah ok, thank you for pointing that out and actually even adding some regularisation doesn't help. Do you know the reason why it would give NaNs sometimes? I guess also we can avoid using SVD by rather using a matrix factorization formulation if it's used in the loss function, since the matrix factorization formulation would require only matmul and transpose ops (+ some constraints that can be linearized with a proximal form)
FYI: I have an initial version of this out for review internally.
@rmlarsen great, thank you, can't wait to try it out :)
The code was submitted and should appear on github within a day or so. There are certain restrictions for the gradient computation that I welcome contributions to lift:
"This initial version has the following restrictions: Only supports statically known inner matrix dimensions m and n.
Backpropagating through U and V (i.e. backpropagating through SVD nodes with compute_uv=True) has further restrictions: a) Only supports real tensors. b) Only supports square and "almost square" matrices where the number of rows and columns differ by at most 1. c) full_matrices must be true also. This does not currently have severe implications, given the restriction in b)."
Let me close this and open a new issue for extending support for more general matrices.
@caisq thanks for the quick push!
Followup issue is https://github.com/tensorflow/tensorflow/issues/13641
@rmlarsen was the formula from https://people.maths.ox.ac.uk/gilesm/files/NA-08-01.pdf used or were there a different formula used?
Hi, I have composed one gradient function based on Matrix-backpropagation paper. Hope it helps.
def matrix_symmetric(x): return (x + tf.transpose(x, [0,2,1])) / 2 def get_eigen_K(x, square=False): """ Get K = 1 / (sigma_i - sigma_j) for i != j, 0 otherwise Parameters ---------- x : tf.Tensor with shape as [..., dim,] Returns ------- """ if square: x = tf.square(x) res = tf.expand_dims(x, 1) - tf.expand_dims(x, 2) res += tf.eye(tf.shape(res)[1]) res = 1 / res res -= tf.eye(tf.shape(res)[1]) # Keep the results clean res = tf.where(tf.is_nan(res), tf.zeros_like(res), res) res = tf.where(tf.is_inf(res), tf.zeros_like(res), res) return res @tf.RegisterGradient('Svd') def gradient_svd(op, grad_s, grad_u, grad_v): """ Define the gradient for SVD References Ionescu, C., et al, Matrix Backpropagation for Deep Networks with Structured Layers Parameters ---------- op grad_s grad_u grad_v Returns ------- """ s, u, v = op.outputs v_t = tf.transpose(v, [0,2,1]) with tf.name_scope('K'): K = get_eigen_K(s, True) inner = matrix_symmetric(K * tf.matmul(v_t, grad_v)) # Create the shape accordingly. u_shape = u.get_shape()[1].value v_shape = v.get_shape()[1].value # Recover the complete S matrices and its gradient eye_mat = tf.eye(v_shape, u_shape) realS = tf.matmul(tf.reshape(tf.matrix_diag(s), [-1, v_shape]), eye_mat) realS = tf.transpose(tf.reshape(realS, [-1, v_shape, u_shape]), [0, 2, 1]) real_grad_S = tf.matmul(tf.reshape(tf.matrix_diag(grad_s), [-1, v_shape]), eye_mat) real_grad_S = tf.transpose(tf.reshape(real_grad_S, [-1, v_shape, u_shape]), [0, 2, 1]) dxdz = tf.matmul(u, tf.matmul(2 * tf.matmul(realS, inner) + real_grad_S, v_t)) return dxdz
this is very useful, we are assuming that we don't use the U matrix when we have decomposed the original matrix A into U s V, since we do not calculate the derivative respect to U anywhere.
The gradient for the SVD op would be very useful so that it could be used in networks and cost functions. Currently when trying to use SVD I get the follow:
LookupError: No gradient defined for operation 'Svd' (op type: Svd)
So my request is for the gradient for the SVD op