tensorflow / tensorflow

An Open Source Machine Learning Framework for Everyone
https://tensorflow.org
Apache License 2.0
186.17k stars 74.28k forks source link

Feature Request: Gradient for SVD op #6503

Closed kstant0725 closed 7 years ago

kstant0725 commented 7 years ago

The gradient for the SVD op would be very useful so that it could be used in networks and cost functions. Currently when trying to use SVD I get the follow:

LookupError: No gradient defined for operation 'Svd' (op type: Svd)

So my request is for the gradient for the SVD op

yaroslavvb commented 7 years ago

the algorithm is in section 3.2 of An extended collection of matrix derivative results for forward and reverse mode algorithmic differentiation

aselle commented 7 years ago

We are currently working on this internally and I've heard it may be close. @rmlarsen knows more.

ddetone commented 7 years ago

@aselle @rmlarsen Any update on this?

rmlarsen commented 7 years ago

As mentioned, this is underway internally. I believe both the person working on it and I are just back from vacation. I assume this will be available within the coming month.

ddetone commented 7 years ago

Hi @rmlarsen, I am looking through the rc-v1.0 and I don't see the SvdGrad op registered here. Is there somewhere else I should look for it?

satyam-cyc commented 7 years ago

Yes, this would help experiment with spectral methods. @rmlarsen @aselle @yaroslavvb Is this still in active development ?

shariharan99 commented 7 years ago

Is there any update on this? It would be super useful for our work !

shariharan99 commented 7 years ago

@rmlarsen what is the update on this ?

cdiwork commented 7 years ago

There is a current implementation already but it is not yet complete. I was busy with other projects but I will try to come back to this next week.

mlhengin commented 7 years ago

Hello, Any news on the features? Would be very helpful to me too.

kstant0725 commented 7 years ago

Hello, I was also wondering if there was any progress on this?

schmiflo commented 7 years ago

Would be great to have this functionality.

kofd commented 7 years ago
def svd(A, full_matrices=False, compute_uv=True, name=None):
  # since dA = dUSVt + UdSVt + USdVt
  # we can simply recompute each matrix using A = USVt
  # while blocking gradients to the original op.
  _, M, N = A.get_shape().as_list()
  P = min(M, N)
  S0, U0, V0 = map(tf.stop_gradient, tf.svd(A, full_matrices=True, name=name))
  Ui, Vti = map(tf.matrix_inverse, [U0, tf.transpose(V0, (0, 2, 1))])
  # A = USVt
  # S = UiAVti
  S = tf.matmul(Ui, tf.matmul(A, Vti))
  S = tf.matrix_diag_part(S)
  if not compute_uv:
    return S
  Si = tf.pad(tf.matrix_diag(1/S0), [[0,0], [0,N-P], [0,M-P]])
  # U = AVtiSi
  U = tf.matmul(A, tf.matmul(Vti, Si))
  U = U if full_matrices else U[:, :M, :P]
  # Vt = SiUiA
  V = tf.transpose(tf.matmul(Si, tf.matmul(Ui, A)), (0, 2, 1))
  V = V if full_matrices else V[:, :N, :P]
  return S, U, V
albertpumarola commented 7 years ago

Hi, @aselle @rmlarsen Any news?

kcyu2014 commented 7 years ago

Hi, I have composed one gradient function based on Matrix-backpropagation paper. Hope it helps.


def matrix_symmetric(x):
    return (x + tf.transpose(x, [0,2,1])) / 2

def get_eigen_K(x, square=False):
    """
    Get K = 1 / (sigma_i - sigma_j) for i != j, 0 otherwise

    Parameters
    ----------
    x : tf.Tensor with shape as [..., dim,]

    Returns
    -------

    """
    if square:
        x = tf.square(x)
    res = tf.expand_dims(x, 1) - tf.expand_dims(x, 2)
    res += tf.eye(tf.shape(res)[1])
    res = 1 / res
    res -= tf.eye(tf.shape(res)[1])

    # Keep the results clean
    res = tf.where(tf.is_nan(res), tf.zeros_like(res), res)
    res = tf.where(tf.is_inf(res), tf.zeros_like(res), res)
    return res

@tf.RegisterGradient('Svd')
def gradient_svd(op, grad_s, grad_u, grad_v):
    """
    Define the gradient for SVD
    References
        Ionescu, C., et al, Matrix Backpropagation for Deep Networks with Structured Layers

    Parameters
    ----------
    op
    grad_s
    grad_u
    grad_v

    Returns
    -------
    """
    s, u, v = op.outputs
    v_t = tf.transpose(v, [0,2,1])

    with tf.name_scope('K'):
        K = get_eigen_K(s, True)
    inner = matrix_symmetric(K * tf.matmul(v_t, grad_v))

    # Create the shape accordingly.
    u_shape = u.get_shape()[1].value
    v_shape = v.get_shape()[1].value

    # Recover the complete S matrices and its gradient
    eye_mat = tf.eye(v_shape, u_shape)
    realS = tf.matmul(tf.reshape(tf.matrix_diag(s), [-1, v_shape]), eye_mat)
    realS = tf.transpose(tf.reshape(realS, [-1, v_shape, u_shape]), [0, 2, 1])

    real_grad_S = tf.matmul(tf.reshape(tf.matrix_diag(grad_s), [-1, v_shape]), eye_mat)
    real_grad_S = tf.transpose(tf.reshape(real_grad_S, [-1, v_shape, u_shape]), [0, 2, 1])

    dxdz = tf.matmul(u, tf.matmul(2 * tf.matmul(realS, inner) + real_grad_S, v_t))
    return dxdz
kmyi commented 7 years ago

@kcyu2014 Why don't you make a PR?

albertpumarola commented 7 years ago

@kcyu2014 Thx for the code, but it is missing the get_eigen_K and matrix_symmetric implementations. Could you post them?

kcyu2014 commented 7 years ago

@albertpumarola Sorry I forgot it and now its updated :)

smilli commented 7 years ago

+1 would be very useful :) @rmlarsen

JasZhanAva commented 7 years ago

Yes, please add this feature, super helpful for matrix nuclear norm. @rmlarsen

JasZhanAva commented 7 years ago

Have you test the code @kcyu2014 contribute? Is it work? @albertpumarola

smilli commented 7 years ago

I tried it and it didn't work for me :/ Can find logs later if it's helpful for people

psycharo-zz commented 7 years ago

The implementation by @kcyu2014 does not have gradients for U, only for S and V (those seem to agree with numerical gradients though).

LionSR commented 7 years ago

I need this feature badly. Could someone get it done fast?

hicham-eyeem commented 7 years ago

Hi, any update about this? I tried the code by @kcyu2014 but it didn't work properly unfortunately.

aselle commented 7 years ago

@rmlarsen. Any update?

rmlarsen commented 7 years ago

Sorry for the lack of progress on this. I will try to set aside a few days to get this in now. Especially now that we have GPU support for all the linear algebra ops (minus complex SVD), this is a gaping hole.

psycharo-zz commented 7 years ago

here is an implementation that should work for square matrices.

hicham-eyeem commented 7 years ago

@psycharo seems to work but sometimes the loss goes to NaN when using svd in the loss (nuclear norm), but that might be the problem of my architecture not the SVD backprop code

yaroslavvb commented 7 years ago

@hicham-eyeem -- TensorFlow SVD has some bugs that cause NaNs sometimes -- https://github.com/tensorflow/tensorflow/issues/9234 , you could double check if this is fixed using numpy version

hicham-eyeem commented 7 years ago

@yaroslavvb ah ok, thank you for pointing that out and actually even adding some regularisation doesn't help. Do you know the reason why it would give NaNs sometimes? I guess also we can avoid using SVD by rather using a matrix factorization formulation if it's used in the loss function, since the matrix factorization formulation would require only matmul and transpose ops (+ some constraints that can be linearized with a proximal form)

rmlarsen commented 7 years ago

FYI: I have an initial version of this out for review internally.

hicham-eyeem commented 7 years ago

@rmlarsen great, thank you, can't wait to try it out :)

rmlarsen commented 7 years ago

The code was submitted and should appear on github within a day or so. There are certain restrictions for the gradient computation that I welcome contributions to lift:

"This initial version has the following restrictions: Only supports statically known inner matrix dimensions m and n.

Backpropagating through U and V (i.e. backpropagating through SVD nodes with compute_uv=True) has further restrictions: a) Only supports real tensors. b) Only supports square and "almost square" matrices where the number of rows and columns differ by at most 1. c) full_matrices must be true also. This does not currently have severe implications, given the restriction in b)."

rmlarsen commented 7 years ago

Let me close this and open a new issue for extending support for more general matrices.

rmlarsen commented 7 years ago

@caisq thanks for the quick push!

rmlarsen commented 7 years ago

Followup issue is https://github.com/tensorflow/tensorflow/issues/13641

JaeDukSeo commented 6 years ago

@rmlarsen was the formula from https://people.maths.ox.ac.uk/gilesm/files/NA-08-01.pdf used or were there a different formula used?

JaeDukSeo commented 5 years ago

Hi, I have composed one gradient function based on Matrix-backpropagation paper. Hope it helps.

def matrix_symmetric(x):
    return (x + tf.transpose(x, [0,2,1])) / 2

def get_eigen_K(x, square=False):
    """
    Get K = 1 / (sigma_i - sigma_j) for i != j, 0 otherwise

    Parameters
    ----------
    x : tf.Tensor with shape as [..., dim,]

    Returns
    -------

    """
    if square:
        x = tf.square(x)
    res = tf.expand_dims(x, 1) - tf.expand_dims(x, 2)
    res += tf.eye(tf.shape(res)[1])
    res = 1 / res
    res -= tf.eye(tf.shape(res)[1])

    # Keep the results clean
    res = tf.where(tf.is_nan(res), tf.zeros_like(res), res)
    res = tf.where(tf.is_inf(res), tf.zeros_like(res), res)
    return res

@tf.RegisterGradient('Svd')
def gradient_svd(op, grad_s, grad_u, grad_v):
    """
    Define the gradient for SVD
    References
        Ionescu, C., et al, Matrix Backpropagation for Deep Networks with Structured Layers

    Parameters
    ----------
    op
    grad_s
    grad_u
    grad_v

    Returns
    -------
    """
    s, u, v = op.outputs
    v_t = tf.transpose(v, [0,2,1])

    with tf.name_scope('K'):
        K = get_eigen_K(s, True)
    inner = matrix_symmetric(K * tf.matmul(v_t, grad_v))

    # Create the shape accordingly.
    u_shape = u.get_shape()[1].value
    v_shape = v.get_shape()[1].value

    # Recover the complete S matrices and its gradient
    eye_mat = tf.eye(v_shape, u_shape)
    realS = tf.matmul(tf.reshape(tf.matrix_diag(s), [-1, v_shape]), eye_mat)
    realS = tf.transpose(tf.reshape(realS, [-1, v_shape, u_shape]), [0, 2, 1])

    real_grad_S = tf.matmul(tf.reshape(tf.matrix_diag(grad_s), [-1, v_shape]), eye_mat)
    real_grad_S = tf.transpose(tf.reshape(real_grad_S, [-1, v_shape, u_shape]), [0, 2, 1])

    dxdz = tf.matmul(u, tf.matmul(2 * tf.matmul(realS, inner) + real_grad_S, v_t))
    return dxdz

this is very useful, we are assuming that we don't use the U matrix when we have decomposed the original matrix A into U s V, since we do not calculate the derivative respect to U anywhere.