Closed XuehaiPan closed 1 year ago
I think the answer is: it really depends on the function and the sizes of the inputs. There's no simple answer so benchmarking and seeing which one is faster for your use case is our recommendation.
In terms of operator coverage, the reverse-over-reverse is guaranteed to cover more PyTorch operations due to PyTorch's reverse-more AD being around a lot longer than PyTorch's forward-mode AD.
cc @soulitzer @albanD if you disagree or have more to add.
In terms of operator coverage, the reverse-over-reverse is guaranteed to cover more PyTorch operations due to PyTorch's reverse-more AD being around a lot longer than PyTorch's forward-mode AD.
This makes sense to me. Thanks.
I'm using
functorch
to compute the hessian-vector product (hvp
) for my model. I have noticed that the hessian matrix is symmetric andhvp
andvhp
should be the transpose of each other. I'm wondering what's the best way to compute thehvp
in practice (speed sensitive).The documentation functorch Tutorials: Computing Hessian-vector products says:
The reverse + forward approach is more memory efficient. How about the time performance? Which one is recommended? Many thanks!
A small snippet copied from Jacobians, Hessians, hvp, vhp, and more: composing functorch transforms:
Result:
The reverse-mode AD
vjp
is ~26% faster than the forward-mode ADjvp
. I haven't tested the memory cost yet.