pytorch / functorch

functorch is JAX-like composable function transforms for PyTorch.
https://pytorch.org/functorch/
BSD 3-Clause "New" or "Revised" License
1.38k stars 102 forks source link

.item() error when computing Jacobian with vmap and `torch.autograd.set_detect_anomaly(True)` #1049

Closed jotix16 closed 1 year ago

jotix16 commented 1 year ago

Running the example in the official example here with torch.autograd.set_detect_anomaly(True) causes an error:

return Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
RuntimeError: vmap: It looks like you're calling .item() on a Tensor. We don't support vmap over calling .item() on a Tensor, please try to rewrite what you're doing with other operations. If error is occurring somewhere inside PyTorch internals, please file a bug report.
# Setup
torch.autograd.set_detect_anomaly(True)
N = 5
f = lambda x: x ** 2
x = torch.randn(N, requires_grad=True)
y = f(x)
I_N = torch.eye(N)
# Sequential approach
jacobian_rows = [torch.autograd.grad(y, x, v, retain_graph=True)[0]
                 for v in I_N.unbind()]
jacobian = torch.stack(jacobian_rows)
# vectorized gradient computation
def get_vjp(v):
    return torch.autograd.grad(y, x, v)
jacobian = functorch.vmap(get_vjp)(I_N)
jotix16 commented 1 year ago

pytorch/pytorch#124423 related?

zou3519 commented 1 year ago

Thanks for reporting, @jotix16. This is a bug that we can fix. Out of curiosity, could you explain a bit about your use case in trying these two APIs together?

jotix16 commented 1 year ago

Hi @zou3519, this is purely the code from the example in the official documentation.

I am uncertain whether I understood your question right, but here is my use-case:

I have a learned dynamical system and want to parallelize the computation of the Jacobians for an N steps rollout.

I.e. I have N tuples $(\mathbf u_k, \mathbf xk, \mathbf x{k+1})$ for $k = 0,...,N-1$, with disconnected computational graphs, i.e., before each step I call u_k.detach().requires_grad_() and x_k.detach().requires_grad_(). Since I already have the computational graphs, I want to avoid using torch.autograd.jacobian and use vmap instead.

Currently, I am trying without much success to introduce two levels of parallelization

  1. parallelize the computation of Jacobians over all tuples
  2. parallelize the computation of Jacobians itself (row-wise)

I would appreciate any suggestion!

tklosek0 commented 1 year ago

Hi @zou3519,

I'm encountering the same problem as @jotix16.

Are there any plans to fix pytorch internal .item() calling like mentioned here: https://pytorch.org/functorch/nightly/ux_limitations.html#data-dependent-operations-item ? Im also using https://github.com/facebookresearch/theseus and can refer to the issue @luisenp linked here in this thread.

zou3519 commented 1 year ago

Bumping the priority on this one

richardrl commented 1 year ago

Hi @zou3519 , @kshitij12345

Just experienced this error while trying to debug a NAN in my code with anomaly mode and jacrev

Jeff09 commented 1 year ago

Hi @zou3519 @kshitij12345

Just had this error while trying to debug a gradient computation RuntimeError. The exact error message is following.

RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.cuda.DoubleTensor [48]], which is output 0 of AsStridedBackward0, is at version 12; expected version 11 instead. Hint: enable anomaly detection to find the operation that failed to compute its gradient, with torch.autograd.set_detect_anomaly(True).

Thank you for any advices.