Closed Lokiiiiii closed 11 months ago
Looks like a stride propagation error.
cc @dagitses for stride agnostic pytorch
managed to get this more minimal repro, haven't looked much at it yet. (note - if you're trying to repro the original transformers issue, you need to run with a single gpu or else you'll run into some other faketensor issue)
import torch
x = torch.rand((1, 12, 256*64), requires_grad=True)
def transpose_for_scores(x):
new_x_shape = x.size()[:-1] + (256, -1)
x = x.view(new_x_shape)
return x.permute(0, 2, 1, 3)
def fn(x):
scale_factor = 0.5
x = x.relu()
x = transpose_for_scores(x)
x /= torch.sqrt(torch.tensor(x.size(-1), dtype=torch.float) * scale_factor)
return x.transpose(-1, -2)
fn(x)
torch.compile(fn)(x)
Hmm neither CrossRefFakeMode
nor DebugInterpreter catch this.
Even aot_eager fails here.
import torch
x = torch.rand((1, 12, 256*64), requires_grad=True)
def transpose_for_scores(x):
new_x_shape = x.size()[:-1] + (256, -1)
x = x.view(new_x_shape)
return x.permute(0, 2, 1, 3)
def fn(x):
scale_factor = 0.5
x = x.relu()
x = transpose_for_scores(x)
x /= torch.sqrt(torch.tensor(x.size(-1), dtype=torch.float) * scale_factor)
return x.transpose(-1, -2)
fn(x)
torch.compile(fn, backend="aot_eager")(x)
cc @ezyang @bdhirsh to advise.
THe minimum repros throw different error ("one of the variables needed for gradient computation has been modified by an inplace operation"). THe original view error is probably due to copy_
decomposition producing wrong strides, @bdhirsh has a fix for this that is blocked by cpp codegen in fbcode
That looks like something that should be fixed by this PR https://github.com/pytorch/pytorch/issues/96456#issuecomment-1562284376. I can't test it at the moment (allocation was nuked) but I can try to confirm later.
Unfortunately even with the copy() decomp fix in inductor, the repro now gives this error for me:
RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.FloatTensor [1, 12, 16384]], which is output 0 of ReluBackward0, is at version 1; expected version 0 instead.
Actually, I realized that the small repro above is broken (that error also shows up if you run in eager, and actually call .backward()
).
I tried running the HuggingFace repro. On my 40gb machine, I get an OOM - it would be great if someone can patch this PR locally and try to repro! https://github.com/pytorch/pytorch/issues/96456.
@davidberard98's repro still fails for me in AOTAutograd https://github.com/pytorch/pytorch/issues/96456#issuecomment-1467355129
File "/data/users/ezyang/b/pytorch/torch/fx/experimental/proxy_tensor.py", line 532, in __torch_dispatch__
return self.inner_torch_dispatch(func, types, args, kwargs)
File "/data/users/ezyang/b/pytorch/torch/fx/experimental/proxy_tensor.py", line 557, in inner_torch_dispatch
return proxy_call(self, func, self.pre_dispatch, args, kwargs)
File "/data/users/ezyang/b/pytorch/torch/fx/experimental/proxy_tensor.py", line 367, in proxy_call
out = func(*args, **kwargs)
File "/data/users/ezyang/b/pytorch/torch/_ops.py", line 429, in __call__
return self._op(*args, **kwargs or {})
File "/data/users/ezyang/b/pytorch/torch/utils/_stats.py", line 20, in wrapper
return fn(*args, **kwargs)
File "/data/users/ezyang/b/pytorch/torch/_subclasses/fake_tensor.py", line 1160, in __torch_dispatch__
return self.dispatch(func, types, args, kwargs)
File "/data/users/ezyang/b/pytorch/torch/_subclasses/fake_tensor.py", line 1404, in dispatch
r = func(*args, **kwargs)
File "/data/users/ezyang/b/pytorch/torch/_ops.py", line 429, in __call__
return self._op(*args, **kwargs or {})
File "/data/users/ezyang/b/pytorch/torch/_refs/__init__.py", line 4138, in view
return _reshape_view_helper(a, *shape, allow_copy=False)
File "/data/users/ezyang/b/pytorch/torch/_refs/__init__.py", line 3352, in _reshape_view_helper
raise ValueError(msg)
torch._dynamo.exc.BackendCompilerFailed: backend='inductor' raised:
ValueError: Cannot view a tensor with shape torch.Size([1, 12, 256, 64]) and strides (196608, 64, 768, 1) as a tensor with shape (1, 12, 16384)!
You can suppress this exception and fall back to eager by setting:
import torch._dynamo
torch._dynamo.config.suppress_errors = True
I get a different error when I try to run @davidberard98's repro today:
/data/users/williamwen/pytorch/torch/autograd/__init__.py:411: UserWarning: Error detected in ReluBackward0. Traceback of forward call that caused the error:
File "/data/users/williamwen/pytorch/playground5.py", line 12, in fn
x = x.relu()
(Triggered internally at /data/users/williamwen/pytorch/torch/csrc/autograd/python_anomaly_mode.cpp:113.)
result = Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass
Traceback (most recent call last):
File "/data/users/williamwen/pytorch/playground5.py", line 18, in <module>
torch.compile(fn)(x)
File "/data/users/williamwen/pytorch/torch/_dynamo/eval_frame.py", line 489, in _fn
return fn(*args, **kwargs)
File "/data/users/williamwen/pytorch/torch/_dynamo/eval_frame.py", line 655, in catch_errors
return callback(frame, cache_entry, hooks, frame_state)
File "/data/users/williamwen/pytorch/torch/_dynamo/convert_frame.py", line 721, in _convert_frame
result = inner_convert(frame, cache_entry, hooks, frame_state)
File "/data/users/williamwen/pytorch/torch/_dynamo/convert_frame.py", line 383, in _convert_frame_assert
compiled_product = _compile(
File "/data/users/williamwen/pytorch/torch/_dynamo/convert_frame.py", line 645, in _compile
guarded_code = compile_inner(code, one_graph, hooks, transform)
File "/data/users/williamwen/pytorch/torch/_dynamo/utils.py", line 244, in time_wrapper
r = func(*args, **kwargs)
File "/data/users/williamwen/pytorch/torch/_dynamo/convert_frame.py", line 562, in compile_inner
out_code = transform_code_object(code, transform)
File "/data/users/williamwen/pytorch/torch/_dynamo/bytecode_transformation.py", line 1033, in transform_code_object
transformations(instructions, code_options)
File "/data/users/williamwen/pytorch/torch/_dynamo/convert_frame.py", line 151, in _fn
return fn(*args, **kwargs)
File "/data/users/williamwen/pytorch/torch/_dynamo/convert_frame.py", line 527, in transform
tracer.run()
File "/data/users/williamwen/pytorch/torch/_dynamo/symbolic_convert.py", line 2123, in run
super().run()
File "/data/users/williamwen/pytorch/torch/_dynamo/symbolic_convert.py", line 818, in run
and self.step()
File "/data/users/williamwen/pytorch/torch/_dynamo/symbolic_convert.py", line 781, in step
getattr(self, inst.opname)(inst)
File "/data/users/williamwen/pytorch/torch/_dynamo/symbolic_convert.py", line 2238, in RETURN_VALUE
self.output.compile_subgraph(
File "/data/users/williamwen/pytorch/torch/_dynamo/output_graph.py", line 912, in compile_subgraph
self.compile_and_call_fx_graph(tx, list(reversed(stack_values)), root)
File "/data/users/williamwen/py310-env/lib/python3.10/contextlib.py", line 79, in inner
return func(*args, **kwds)
File "/data/users/williamwen/pytorch/torch/_dynamo/output_graph.py", line 1080, in compile_and_call_fx_graph
compiled_fn = self.call_user_compiler(gm)
File "/data/users/williamwen/pytorch/torch/_dynamo/utils.py", line 244, in time_wrapper
r = func(*args, **kwargs)
File "/data/users/williamwen/pytorch/torch/_dynamo/output_graph.py", line 1152, in call_user_compiler
raise BackendCompilerFailed(self.compiler_fn, e).with_traceback(
File "/data/users/williamwen/pytorch/torch/_dynamo/output_graph.py", line 1133, in call_user_compiler
compiled_fn = compiler_fn(gm, self.example_inputs())
File "/data/users/williamwen/pytorch/torch/_dynamo/repro/after_dynamo.py", line 117, in debug_wrapper
compiled_gm = compiler_fn(gm, example_inputs)
File "/data/users/williamwen/pytorch/torch/__init__.py", line 1657, in __call__
return compile_fx(model_, inputs_, config_patches=self.config)
File "/data/users/williamwen/pytorch/torch/_inductor/compile_fx.py", line 1168, in compile_fx
return aot_autograd(
File "/data/users/williamwen/pytorch/torch/_dynamo/backends/common.py", line 55, in compiler_fn
cg = aot_module_simplified(gm, example_inputs, **kwargs)
File "/data/users/williamwen/pytorch/torch/_functorch/aot_autograd.py", line 4938, in aot_module_simplified
compiled_fn = create_aot_dispatcher_function(
File "/data/users/williamwen/pytorch/torch/_dynamo/utils.py", line 244, in time_wrapper
r = func(*args, **kwargs)
File "/data/users/williamwen/pytorch/torch/_functorch/aot_autograd.py", line 4478, in create_aot_dispatcher_function
compiled_fn = compiler_fn(flat_fn, fake_flat_args, aot_config, fw_metadata=fw_metadata)
File "/data/users/williamwen/pytorch/torch/_functorch/aot_autograd.py", line 2813, in aot_wrapper_dedupe
return compiler_fn(flat_fn, leaf_flat_args, aot_config, fw_metadata=fw_metadata)
File "/data/users/williamwen/pytorch/torch/_functorch/aot_autograd.py", line 2999, in aot_wrapper_synthetic_base
return compiler_fn(flat_fn, flat_args, aot_config, fw_metadata=fw_metadata)
File "/data/users/williamwen/pytorch/torch/_functorch/aot_autograd.py", line 3700, in aot_dispatch_autograd
fx_g, joint_inputs, maybe_subclass_meta = aot_dispatch_autograd_graph(flat_fn, flat_args, aot_config, fw_metadata=fw_metadata)
File "/data/users/williamwen/pytorch/torch/_functorch/aot_autograd.py", line 3680, in aot_dispatch_autograd_graph
fx_g = create_graph(joint_fn_to_trace, updated_joint_inputs, aot_config=aot_config)
File "/data/users/williamwen/pytorch/torch/_functorch/aot_autograd.py", line 1943, in create_graph
fx_g = make_fx(f, decomposition_table=aot_config.decompositions)(*args)
File "/data/users/williamwen/pytorch/torch/fx/experimental/proxy_tensor.py", line 869, in wrapped
t = dispatch_trace(wrap_key(func, args, fx_tracer, pre_dispatch), tracer=fx_tracer, concrete_args=tuple(phs))
File "/data/users/williamwen/pytorch/torch/_compile.py", line 24, in inner
return torch._dynamo.disable(fn, recursive)(*args, **kwargs)
File "/data/users/williamwen/pytorch/torch/_dynamo/eval_frame.py", line 489, in _fn
return fn(*args, **kwargs)
File "/data/users/williamwen/pytorch/torch/_dynamo/external_utils.py", line 17, in inner
return fn(*args, **kwargs)
File "/data/users/williamwen/pytorch/torch/fx/experimental/proxy_tensor.py", line 481, in dispatch_trace
graph = tracer.trace(root, concrete_args)
File "/data/users/williamwen/pytorch/torch/_dynamo/eval_frame.py", line 489, in _fn
return fn(*args, **kwargs)
File "/data/users/williamwen/pytorch/torch/_dynamo/external_utils.py", line 17, in inner
return fn(*args, **kwargs)
File "/data/users/williamwen/pytorch/torch/fx/_symbolic_trace.py", line 821, in trace
(self.create_arg(fn(*args)),),
File "/data/users/williamwen/pytorch/torch/fx/_symbolic_trace.py", line 688, in flatten_fn
tree_out = root_fn(*tree_args)
File "/data/users/williamwen/pytorch/torch/fx/experimental/proxy_tensor.py", line 517, in wrapped
out = f(*tensors)
File "/data/users/williamwen/pytorch/torch/_functorch/aot_autograd.py", line 1929, in joint_helper
return functionalized_f_helper(primals, tangents)
File "/data/users/williamwen/pytorch/torch/_functorch/aot_autograd.py", line 1882, in functionalized_f_helper
f_outs = fn(*f_args)
File "/data/users/williamwen/pytorch/torch/_functorch/aot_autograd.py", line 1850, in inner_fn_with_anomaly
return inner_fn(*args)
File "/data/users/williamwen/pytorch/torch/_functorch/aot_autograd.py", line 1833, in inner_fn
backward_out = torch.autograd.grad(
File "/data/users/williamwen/pytorch/torch/autograd/__init__.py", line 411, in grad
result = Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass
torch._dynamo.exc.BackendCompilerFailed: backend='inductor' raised:
RuntimeError: one of the variables needed for gradient computation has been modified by an inplace operation: [torch.FloatTensor [1, 12, 16384]], which is output 0 of ReluBackward0, is at version 1; expected version 0 instead. Hint: the backtrace further above shows the operation that failed to compute its gradient. The variable in question was changed in there or anywhere later. Good luck!
@Lokiiiiii can you re-open if you're still seeing an issue? David's smaller repro above no longer fails with the original error, as Yanbo pointed out. The new error is actually because the minimized repro isn't quite valid - even in eager mode, that code will fail if you call out.sum().backward()
, because the repro code is mutating the output of relu()
, which was saved for backward.
๐ Describe the bug
When using HuggingFace's Trainer API I noticed that PyTorch eager mode succeeds as expected but inductor fails with a shape mismatch error:
This only happens with the deberta-base model
Error logs
Minified repro
Minifier was unable to repro the error
Versions
cc @ezyang @eellison @bdhirsh @msaroufim @wconstab @anijain2305 @zou3519 @ngimel @soumith