Closed awav closed 4 years ago
Okay, looks like it is my misunderstanding of the @differentiable
. The tuple returns a function for the gradient and first element is simply a forward step.
Feel free to close the issue, if that's the case :)
Okay, looks like it is my misunderstanding of the
@differentiable
. The tuple returns a function for the gradient and first element is simply a forward step.
This is accurate!
// Given a function type `(T0, ...) -> U`
// (where `T0`, ..., `U` conform to the `Differentiable` protocol †),
//
// The VJP function typing rules are:
//
// (T0, ...) -> (U, (U.TangentVector) -> (T0.TangentVector, ...))
// ^ ^ ^~~~~~~~~~~~~~~ ^~~~~~~~~~~~~~~~~~~~~
// original args result derivative wrt result derivative wrt args
//
// The derivative function returns a tuple of:
// - The original result of type `U`.
// - A "backpropagator" pullback function takes that the derivative with respect to
// the result and returns derivatives with respect to arguments.
//
// †: only "wrt" parameters need to conform to `Differentiable`.
Derivative functions are defined as VJP functions taking original arguments rather than just the "returned pullback" functions. This allows the pullback function to capture and use intermediate values computed in the original function.
That's exactly what happens for cholesky
: _vjpCholesky
computes let decomposition = cholesky(x)
, and the pullback closure captures and uses decomposition
rather than recomputing it.
Please read the "JVP and VJP functions" section from bit.ly/swift-autodiff-internals for more details!
gosh, I love that...
If you're interested in more cool differentiation stuff, check out our custom differentiation tutorial:
Other cool features are on the roadmap!
AdditiveArithmetic.+
or ElementaryFunctions.exp
again for conforming types of those protocols
Hello everyone,
I have a concern about this line: https://github.com/tensorflow/swift-apis/blob/master/Sources/TensorFlow/Operators/Math.swift#L2712. In TensorFlow for python - gradients have access to the input and output of the operation, therefore there is no need in recomputing A=LLᵀ. The gradient signature normally looks like
gradient(op, grad)
, where op has attributes like input and output.But here, vector jacobian has to recompute decomposition once more time. Can it be avoided somehow?
PS: Actually, there are numerous examples where forward step value is necessary for gradient computation and this is quite important for computationally efficient gradient computation in general.