There was a problem with using designated GPU devices by CUDA_VISIBLE_DEVICES environment variable. Instead of using actual device id number, we need to used local_rank instead, so I replaced them and removed device_id. Now, training with DDP works for any cases.
There was a problem with using designated GPU devices by
CUDA_VISIBLE_DEVICES
environment variable. Instead of using actual device id number, we need to usedlocal_rank
instead, so I replaced them and removeddevice_id
. Now, training with DDP works for any cases.