pytorch / torchdynamo

A Python-level JIT compiler designed to make unmodified PyTorch programs faster.
BSD 3-Clause "New" or "Revised" License
1k stars 123 forks source link

TorchBench training error: RuntimeError: Default process group has not been initialized, please make sure to call init_process_group. #1148

Closed desertfire closed 1 year ago

desertfire commented 2 years ago

Repro:

benchmarks/torchbench.py -d cuda --inductor --training --float32 --no-skip -k moco
benchmarks/torchbench.py -d cuda --inductor --training --float32 --no-skip -k mobilenet_v3_large

Error:

Traceback (most recent call last):
  File "benchmarks/torchbench.py", line 354, in <module>
    main(TorchBenchmarkRunner(), original_dir)
  File "/fsx/users/binbao/torchdynamo-tip/benchmarks/common.py", line 1854, in main
    device, name, model, example_inputs, batch_size = runner.load_model(
  File "benchmarks/torchbench.py", line 260, in load_model
    benchmark = benchmark_cls(
  File "/fsx/users/binbao/torchbenchmark/torchbenchmark/util/model.py", line 16, in __call__
    obj = type.__call__(cls, *args, **kwargs)
  File "/fsx/users/binbao/torchbenchmark/torchbenchmark/models/moco/__init__.py", line 68, in __init__
    self.model = torch.nn.parallel.DistributedDataParallel(
  File "/fsx/users/binbao/pytorch-release/torch/nn/parallel/distributed.py", line 601, in __init__
    self.process_group = _get_default_group()
  File "/fsx/users/binbao/pytorch-release/torch/distributed/distributed_c10d.py", line 493, in _get_default_group
    raise RuntimeError(
RuntimeError: Default process group has not been initialized, please make sure to call init_process_group.
desertfire commented 2 years ago

Assign to @wconstab as it is DDP related.

wconstab commented 2 years ago

this looks like a benchmark config issue to me. can you shed any light on why it's popping up now? (did we just start running this model for the first time, or something else changed?)

if we're trying to benchmark a model wrapped in DDP we should init the process group first. Since we don't support DDP benchmarking in the torchdynamo/benchmarks/* infra, there isn't any infra in there that inits the process group.

it looks like this model got wrapped in DDP either intentionally or perhaps unwittingly?

desertfire commented 2 years ago

For context, the issue was hidden behind some other bugs. We can skip this model for now (cc @anijain2305), but thought it would be an interesting use case for your study.

wconstab commented 2 years ago

@xuzhao9 Do you know why moco would be wrapped in DDP by default? I think for other models we avoided having the DDP wrapper in the normal (non-distributed) benchmarks. Should we also remove it from moco?

we can, separately, benchmark moco-ddp, but we should do that using the 'distributed' trainer in torchbench. that would apply DDP wrapper itself to the model, so we'd be double-wrapping in 2 layers of DDP if we used the already-ddp'd moco with 'distributed' trainer.

anijain2305 commented 1 year ago

Unable to repro this. So, closing.