Closed ductuantruong closed 1 year ago
Hi, please try the following code:
torchrun --master_addr=localhost --master_port=16888 --nnodes=1 --nproc_per_node=$num_gpus \
Ref: https://yzsxeajuhm.feishu.cn/docx/JNmddhTz0oDA8zxgDN1cJeqnnQb
Thank you for your quickly response. I will try it. I am closing this issue.
Hi,
Thank you for developing this amazing toolkit. I am currently running my experiments with your toolkit. However, if I run multiple experiments on one computing node, I notice that if one job finish first, it will cause the following errors for the remaining running jobs:
I am asking whether you encountered this issue. If yes, could you guide me how to fix this bug? Once again, thank you for sharing this toolkit and helping us.