Open kikoaumond opened 4 years ago
Hi, I am running a model with FCOS as detector using torch.distributed I see in https://github.com/tianzhi0549/FCOS/blob/master/fcos_core/modeling/rpn/fcos/loss.py that torch.distributed.all_reduce is used to aggregate the centerness loss, in https://github.com/tianzhi0549/FCOS/blob/dd7bfba8c4269ce2930a4a588a907666b970690e/fcos_core/modeling/rpn/fcos/loss.py#L279
But I don't see the same being done for reg_loss and cls_loss. Shouldn't they also be aggregated with all_reduce when being run in multi-process mode?
Thank you
Hi, I am running a model with FCOS as detector using torch.distributed I see in https://github.com/tianzhi0549/FCOS/blob/master/fcos_core/modeling/rpn/fcos/loss.py that torch.distributed.all_reduce is used to aggregate the centerness loss, in https://github.com/tianzhi0549/FCOS/blob/dd7bfba8c4269ce2930a4a588a907666b970690e/fcos_core/modeling/rpn/fcos/loss.py#L279
But I don't see the same being done for reg_loss and cls_loss. Shouldn't they also be aggregated with all_reduce when being run in multi-process mode?
Thank you