pytorch / pytorch

Tensors and Dynamic neural networks in Python with strong GPU acceleration
https://pytorch.org
Other
83.56k stars 22.54k forks source link

[moco][dynamo][DDPPartitioner] Bad partitioning #103385

Closed anijain2305 closed 1 year ago

anijain2305 commented 1 year ago

🐛 Describe the bug

Repro - python benchmarks/dynamo/torchbench.py --backend=eager --amp --training --device cuda --performance --only=moco

Note that you use latest torchbench. Mainly the one after this commit - https://github.com/pytorch/benchmark/pull/1693

Error

  File "/scratch/anijain/work/pytorch/torch/_dynamo/output_graph.py", line 857, in compile_and_call_fx_graph
    compiled_fn = self.call_user_compiler(gm)
  File "/scratch/anijain/work/pytorch/torch/_dynamo/utils.py", line 180, in time_wrapper
    r = func(*args, **kwargs)
  File "/scratch/anijain/work/pytorch/torch/_dynamo/output_graph.py", line 913, in call_user_compiler
    raise BackendCompilerFailed(self.compiler_fn, e).with_traceback(
  File "/scratch/anijain/work/pytorch/torch/_dynamo/output_graph.py", line 909, in call_user_compiler
    compiled_fn = compiler_fn(gm, self.example_inputs())
  File "/scratch/anijain/work/pytorch/torch/_dynamo/backends/distributed.py", line 217, in compile_fn
    split_gm = fx.passes.split_module.split_module(
  File "/scratch/anijain/work/pytorch/torch/fx/passes/split_module.py", line 355, in split_module
    base_mod_env[list(partition.outputs)[0]] = output_val
torch._dynamo.exc.BackendCompilerFailed: backend='compile_fn' raised:
IndexError: list index out of range

An observation is that if you disable this particular function, then error goes away.

https://github.com/pytorch/benchmark/blob/main/torchbenchmark/models/moco/moco/builder.py#L44-L50

cc @ezyang @msaroufim @wconstab @bdhirsh @eellison

Versions

N/A

mrembalski commented 1 year ago

I encounter the same problem - will there be a bug fix release soon?

wconstab commented 1 year ago

the fix for this one was landed 3 days ago, but we won't make a special release for it. you'd have to use the nightly build or get the next scheduled release unfortunately.

mrembalski commented 1 year ago

Sure, thanks. I'll use the nightly build.