Closed RockyatASU closed 2 years ago
Thanks for point out that.
I usually use single GPU training, so I annotated this part of the DataParallel from orginal pipeline and change to amp style.
https://github.com/huixiancheng/CENet/blob/a4abc7d71da150c1a1b11b34d4f636d6c07121fd/modules/trainer.py#L412-L421
You could modify this part and make sure amp it's useful for DP. However, I think it's much better to use DDP for multi-gpu training.
Hi,
Thank you for your nice work. When I trained this model with 4 GPUs, the loss backpropagation raises an "RuntimeError: grad can be implicitly created only for scaler outputs" but the code works for single gpu training. I notice that the loss should be a tensor with single number but there are four elements for 4 gpus for multi-gpu training. I think there is a bug for multi-gpu training, please check it out. Thank you!