huawei-noah / Efficient-Computing

Efficient computing methods developed by Huawei Noah's Ark Lab
1.2k stars 211 forks source link

can't train with DDP commond ,about Gold-YOLO #87

Closed jianganghuang closed 10 months ago

jianganghuang commented 1 year ago

I use your commond "python -m torch.distributed.launch --nproc_per_node 8 tools/train.py --device 1,2 --batch 32 ", it will report I should use "torch.distributed.run", then I use "torch.distributed.run",it still can't train, the bug report information is below .How to use DDP trainning commond? image

lose4578 commented 1 year ago

The ddp command looks fine, can you show more detailed error information.

jianganghuang commented 1 year ago

image As is shown in the figure, I train with the DDP commond,but failed; My environment is "python 3.8.16, 1.11.0+cu102”

jianganghuang commented 1 year ago

OK,I have found the problem, thanks

Unicorn123455678 commented 1 year ago

好的,我找到了问题,谢谢

你好,请问是什么问题,我也遇到了这个问题,该如何修改呢