RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation

yanboliang commented 1 year ago

🐛 Describe the bug

This may be the same bug as pytorch/pytorch#93440, but we have a minimized repro here. More evidences:

If I remove the first Relu layer, the bug disappeared.
If I set inplace to False for the second Relu layer, the bug disappeared.
pytorch/pytorch#93440 fails on dynamo/aot_eager/inductor, this one fails on aot_eager/inductor.

Minimized repro:

import torch
import torch._dynamo
import torch.nn as nn

class ConvBlock(nn.Module):

    def __init__(self):
        super().__init__()
        self.block = nn.Sequential(nn.Linear(4, 2), nn.ReLU(), nn.ReLU(inplace=True))

    def forward(self, x):
        return self.block(x)

model = ConvBlock().eval().to("cuda")
opt_model = torch._dynamo.optimize("inductor")(model)
x = torch.rand([4, 4]).to("cuda")
print(opt_model(x))

Error logs

/scratch/ybliang/work/repos/pytorch/torch/_dynamo/eval_frame.py:361: UserWarning: TensorFloat32 tensor cores for float32 matrix multiplication available but not enabled.Consider setting `torch.set_float32_matmul_precision('high')`
  warnings.warn(
/scratch/ybliang/work/repos/pytorch/torch/storage.py:315: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.
  warnings.warn(message, UserWarning)
/scratch/ybliang/work/repos/pytorch/torch/storage.py:315: UserWarning: TypedStorage is deprecated. It will be removed in the future and UntypedStorage will be the only storage class. This should only matter to you if you are using storages directly.
  warnings.warn(message, UserWarning)
Traceback (most recent call last):
  File "/scratch/ybliang/work/repos/pytorch/torch/_dynamo/output_graph.py", line 568, in call_user_compiler
    compiled_fn = compiler_fn(gm, self.fake_example_inputs())
  File "/scratch/ybliang/work/repos/pytorch/torch/_dynamo/debug_utils.py", line 915, in debug_wrapper
    compiled_gm = compiler_fn(gm, example_inputs, **kwargs)
  File "/scratch/ybliang/work/repos/pytorch/torch/_inductor/compile_fx.py", line 394, in compile_fx
    return aot_autograd(
  File "/scratch/ybliang/work/repos/pytorch/torch/_dynamo/optimizations/training.py", line 80, in compiler_fn
    cg = aot_module_simplified(gm, example_inputs, **kwargs)
  File "/scratch/ybliang/work/repos/pytorch/torch/_functorch/aot_autograd.py", line 2093, in aot_module_simplified
    compiled_fn = create_aot_dispatcher_function(
  File "/scratch/ybliang/work/repos/pytorch/torch/_dynamo/utils.py", line 90, in time_wrapper
    r = func(*args, **kwargs)
  File "/scratch/ybliang/work/repos/pytorch/torch/_functorch/aot_autograd.py", line 1792, in create_aot_dispatcher_function
    compiled_fn = compiler_fn(flat_fn, fake_flat_tensor_args, aot_config)
  File "/scratch/ybliang/work/repos/pytorch/torch/_functorch/aot_autograd.py", line 1197, in aot_wrapper_dedupe
    return compiler_fn(flat_fn, leaf_flat_args, aot_config)
  File "/scratch/ybliang/work/repos/pytorch/torch/_functorch/aot_autograd.py", line 1377, in aot_dispatch_autograd
    fx_g = make_fx(
  File "/scratch/ybliang/work/repos/pytorch/torch/fx/experimental/proxy_tensor.py", line 683, in wrapped
    t = dispatch_trace(wrap_key(func, args, fx_tracer), tracer=fx_tracer, concrete_args=tuple(phs))
  File "/scratch/ybliang/work/repos/pytorch/torch/_dynamo/eval_frame.py", line 209, in _fn
    return fn(*args, **kwargs)
  File "/scratch/ybliang/work/repos/pytorch/torch/fx/experimental/proxy_tensor.py", line 441, in dispatch_trace
    graph = tracer.trace(root, concrete_args)
  File "/scratch/ybliang/work/repos/pytorch/torch/_dynamo/eval_frame.py", line 209, in _fn
    return fn(*args, **kwargs)
  File "/scratch/ybliang/work/repos/pytorch/torch/fx/_symbolic_trace.py", line 739, in trace
    (self.create_arg(fn(*args)),),
  File "/scratch/ybliang/work/repos/pytorch/torch/fx/_symbolic_trace.py", line 614, in flatten_fn
    tree_out = root_fn(*tree_args)
  File "/scratch/ybliang/work/repos/pytorch/torch/fx/experimental/proxy_tensor.py", line 457, in wrapped
    out = f(*tensors)
  File "/scratch/ybliang/work/repos/pytorch/torch/_functorch/aot_autograd.py", line 746, in functionalized_joint
    outs = joint_forward_backward(f_primals, f_tangents)
  File "/scratch/ybliang/work/repos/pytorch/torch/_functorch/aot_autograd.py", line 713, in joint_forward_backward
    backward_out = torch.autograd.grad(
  File "/scratch/ybliang/work/repos/pytorch/torch/autograd/__init__.py", line 266, in grad
    return handle_torch_function(
  File "/scratch/ybliang/work/repos/pytorch/torch/overrides.py", line 1520, in handle_torch_function
    result = mode.__torch_function__(public_api, types, args, kwargs)
  File "/scratch/ybliang/work/repos/pytorch/torch/_inductor/overrides.py", line 37, in __torch_function__
    return func(*args, **kwargs)
  File "/scratch/ybliang/work/repos/pytorch/torch/autograd/__init__.py", line 300, in grad
    return Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.cuda.FloatTensor [4, 2]], which is output 0 of ReluBackward0, is at version 1; expected version 0 instead. Hint: enable anomaly detection to find the operation that failed to compute its gradient, with torch.autograd.set_detect_anomaly(True).

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/scratch/ybliang/work/repos/pytorch/debug/debug1.py", line 22, in <module>
    print(opt_model(x))
  File "/scratch/ybliang/work/repos/pytorch/torch/nn/modules/module.py", line 1480, in _call_impl
    return forward_call(*args, **kwargs)
  File "/scratch/ybliang/work/repos/pytorch/torch/_dynamo/eval_frame.py", line 80, in forward
    return self.dynamo_ctx(self._orig_mod.forward)(*args, **kwargs)
  File "/scratch/ybliang/work/repos/pytorch/torch/_dynamo/eval_frame.py", line 209, in _fn
    return fn(*args, **kwargs)
  File "/scratch/ybliang/work/repos/pytorch/torch/_dynamo/eval_frame.py", line 329, in catch_errors
    return callback(frame, cache_size)
  File "/scratch/ybliang/work/repos/pytorch/torch/_dynamo/convert_frame.py", line 468, in _convert_frame
    result = inner_convert(frame, cache_size)
  File "/scratch/ybliang/work/repos/pytorch/torch/_dynamo/convert_frame.py", line 102, in _fn
    return fn(*args, **kwargs)
  File "/scratch/ybliang/work/repos/pytorch/torch/_dynamo/utils.py", line 90, in time_wrapper
    r = func(*args, **kwargs)
  File "/scratch/ybliang/work/repos/pytorch/torch/_dynamo/convert_frame.py", line 339, in _convert_frame_assert
    return _compile(
  File "/scratch/ybliang/work/repos/pytorch/torch/_dynamo/convert_frame.py", line 395, in _compile
    out_code = transform_code_object(code, transform)
  File "/scratch/ybliang/work/repos/pytorch/torch/_dynamo/bytecode_transformation.py", line 341, in transform_code_object
    transformations(instructions, code_options)
  File "/scratch/ybliang/work/repos/pytorch/torch/_dynamo/convert_frame.py", line 382, in transform
    tracer.run()
  File "/scratch/ybliang/work/repos/pytorch/torch/_dynamo/symbolic_convert.py", line 1625, in run
    super().run()
  File "/scratch/ybliang/work/repos/pytorch/torch/_dynamo/symbolic_convert.py", line 484, in run
    and self.step()
  File "/scratch/ybliang/work/repos/pytorch/torch/_dynamo/symbolic_convert.py", line 454, in step
    getattr(self, inst.opname)(inst)
  File "/scratch/ybliang/work/repos/pytorch/torch/_dynamo/symbolic_convert.py", line 1687, in RETURN_VALUE
    self.output.compile_subgraph(self)
  File "/scratch/ybliang/work/repos/pytorch/torch/_dynamo/output_graph.py", line 428, in compile_subgraph
    self.compile_and_call_fx_graph(tx, list(reversed(stack_values)), root)
  File "/scratch/ybliang/work/repos/pytorch/torch/_dynamo/output_graph.py", line 499, in compile_and_call_fx_graph
    compiled_fn = self.call_user_compiler(gm)
  File "/scratch/ybliang/work/repos/pytorch/torch/_dynamo/output_graph.py", line 573, in call_user_compiler
    raise BackendCompilerFailed(self.compiler_fn, e) from e
torch._dynamo.exc.BackendCompilerFailed: compile_fx raised RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.cuda.FloatTensor [4, 2]], which is output 0 of ReluBackward0, is at version 1; expected version 0 instead. Hint: enable anomaly detection to find the operation that failed to compute its gradient, with torch.autograd.set_detect_anomaly(True).

Set torch._dynamo.config.verbose=True for more information

You can suppress this exception and fall back to eager by setting:
    torch._dynamo.config.suppress_errors = True

Minified repro

No response

yanboliang commented 1 year ago

cc @bdhirsh @anijain2305

bdhirsh commented 1 year ago

I talked to Yanbo offline - the above repro actually fails in eager mode too. In order to have the error show up in eager though, you have to actually run out.sum().backward(), which gives:

RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.cuda.FloatTensor [4, 2]], which is output 0 of ReluBackward0, is at version 1; expected version 0 instead. Hint: enable anomaly detection to find the operation that failed to compute its gradient, with torch.autograd.set_detect_anomaly(True).

What's slightly different when we use the compile stack is that we eagerly trace the forward and backward into a joint graph, when the user calls their module's forward(). And tracing through the backward graph causes the error to show up.

yanboliang commented 1 year ago

Thanks @bdhirsh for pointing out the real issue behind this. I checked several other same failures occurred at 7k github model, but all of them are caused by the above reason that @bdhirsh mentioned. This is not a real error, I'll close this.

pytorch / torchdynamo