Closed desertfire closed 1 year ago
Assign to @wconstab as it is DDP related.
this looks like a benchmark config issue to me. can you shed any light on why it's popping up now? (did we just start running this model for the first time, or something else changed?)
if we're trying to benchmark a model wrapped in DDP we should init the process group first. Since we don't support DDP benchmarking in the torchdynamo/benchmarks/*
infra, there isn't any infra in there that inits the process group.
it looks like this model got wrapped in DDP either intentionally or perhaps unwittingly?
For context, the issue was hidden behind some other bugs. We can skip this model for now (cc @anijain2305), but thought it would be an interesting use case for your study.
@xuzhao9 Do you know why moco would be wrapped in DDP by default? I think for other models we avoided having the DDP wrapper in the normal (non-distributed) benchmarks. Should we also remove it from moco?
we can, separately, benchmark moco-ddp, but we should do that using the 'distributed' trainer in torchbench. that would apply DDP wrapper itself to the model, so we'd be double-wrapping in 2 layers of DDP if we used the already-ddp'd moco with 'distributed' trainer.
Unable to repro this. So, closing.
Repro:
Error: