Closed GuGuLL123 closed 2 years ago
Thanks for your interest.
No, I didn't meet such issues... Could you please provide more training details, including batch size, and resolution? Also, could you please point out which parts of the code cause such stuck?
Regards, Yuyuan Liu
I've found a potential issue here. In case there isn't any confident pseudo-label for 1 GPU, the other GPUs will infinitely hang in the backward process. It is more likely to happen when the input resolution is very low.
Please modify it based on the newest code, and I apologize for the inconvenience.
thank you very much! I will try it now.
I'm closing the issue.
If you have any questions for implementing the code or reproducing the result, please feel free to reopen it or send me an email.
Regards, Yuyuan
The code works fine when I train with one gpu. The _warm_up process works fine when using multi-gpus distributed training,but the _train_epoch process gets stuck. Gpus and cpus are still running normally. Have you encountered the same problem?