Training with multi-GPUs

yyliu01 / PS-MT

[CVPR'22] Perturbed and Strict Mean Teachers for Semi-supervised Semantic Segmentation

https://arxiv.org/pdf/2111.12903.pdf

MIT License

186 stars 17 forks source link

Training with multi-GPUs #3

Closed GuGuLL123 closed 2 years ago

GuGuLL123 commented 2 years ago

The code works fine when I train with one gpu. The _warm_up process works fine when using multi-gpus distributed training，but the _train_epoch process gets stuck. Gpus and cpus are still running normally. Have you encountered the same problem?

yyliu01 commented 2 years ago

Thanks for your interest.

No, I didn't meet such issues... Could you please provide more training details, including batch size, and resolution? Also, could you please point out which parts of the code cause such stuck?

Regards, Yuyuan Liu

yyliu01 commented 2 years ago

I've found a potential issue here. In case there isn't any confident pseudo-label for 1 GPU, the other GPUs will infinitely hang in the backward process. It is more likely to happen when the input resolution is very low.

Please modify it based on the newest code, and I apologize for the inconvenience.

GuGuLL123 commented 2 years ago

thank you very much！ I will try it now.

yyliu01 commented 2 years ago

I'm closing the issue.

If you have any questions for implementing the code or reproducing the result, please feel free to reopen it or send me an email.

Regards, Yuyuan