Closed muditbac-curefit closed 4 years ago
Thanks for pointing out this. Your solution seems right.
muditbac-curefit notifications@github.com 于2020年6月13日周六 下午4:34写道:
The variable engine.local_rank is not being set when training with single GPU. I have fixed that by setting local_rank in non distributed setting like this:
if self.distributed: self.local_rank = self.args.local_rank self.world_size = int(os.environ['WORLD_SIZE']) self.world_rank = int(os.environ['RANK']) torch.cuda.set_device(self.local_rank) dist.init_process_group(backend="nccl", init_method='env://') dist.barrier() self.devices = [i for i in range(self.world_size)] else: # todo check non-distributed training self.local_rank = self.args.local_rank self.world_rank = 1 self.devices = parse_torch_devices(self.args.devices)
Can you please let me know if this is the right way to do it?
FYI, when not doing this I am getting error that engine has no property named local_rank
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/megvii-detection/MSPN/issues/24, or unsubscribe https://github.com/notifications/unsubscribe-auth/AC4GMA6EVPV7CRZETSE6HQDRWM2ZZANCNFSM4N44S4QA .
Closing this issue and created a pull request, https://github.com/megvii-detection/MSPN/pull/25
https://github.com/megvii-detection/MSPN/blob/a84f750aaa34e32ded49c44dda6e73a6538c4fde/cvpack/torch_modeling/engine/engine.py#L56
The variable
engine.local_rank
is not being set when training with single GPU. I have fixed that by setting local_rank in non distributed setting like this:Can you please let me know if this is the right way to do it?
FYI, when not doing this I am getting error that engine has no property named local_rank