Open Ghelfi opened 6 months ago
@Skylion007 do you think this is a torch error or something we can do differently?
@Ghelfi do you know if this works for you elsewhere, e.g. if you compile outside Composer? Will help us narrow down if its a Composer issue or PyTorch issue, as the trace looks more like a Pytorch issue to me
This is not clear to me. The provided example above works if you remove the BlurPool
algorithm, which is only on the composer side.
I'll try to redefine some model layer before feeding it to the trainer to mimic the behaviour outside of any composer scope.
On torch 2.3, adding torch._dynamo.config.optimize_ddp = False
at the start of the file seems to fix it.
I am having issue with DDP and torch.compile
on other leads also. I'll keep investigating.
Training a toy example on DDP mode with the composer runtime while using both using
torch.compile
throughTrainer.compile_config={}
andBlurPool
algorithm raises a dynamo error.** To reproduce From develop on a 2 GPU environmment.
Code:
Steps to reproduce the behavior:
composer -n 2 example.py
(see code above)Dynamo Error:
It works if I remove either DDP,
BlurPool
, ortorch.compile
.