Multi-GPU training - Githubissues

huixiancheng / CENet

[ICME 2022] CENet: Toward Concise and Efficient LiDAR Semantic Segmentation for Autonomous Driving

MIT License

99 stars 13 forks source link

Multi-GPU training #8

Closed RockyatASU closed 2 years ago

RockyatASU commented 2 years ago

Hi,

Thank you for your nice work. When I trained this model with 4 GPUs, the loss backpropagation raises an "RuntimeError: grad can be implicitly created only for scaler outputs" but the code works for single gpu training. I notice that the loss should be a tensor with single number but there are four elements for 4 gpus for multi-gpu training. I think there is a bug for multi-gpu training, please check it out. Thank you!

huixiancheng commented 2 years ago

Thanks for point out that. I usually use single GPU training, so I annotated this part of the DataParallel from orginal pipeline and change to amp style.
https://github.com/huixiancheng/CENet/blob/a4abc7d71da150c1a1b11b34d4f636d6c07121fd/modules/trainer.py#L412-L421

You could modify this part and make sure amp it's useful for DP. However, I think it's much better to use DDP for multi-gpu training.