Hi, when i use the command "python -m torch.distributed.run --nproc_per_node=8 pretrain.py --config ./configs/Pretrain.yaml --output_dir output/Pretrain ".
It shows "ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: -9) local_rank: 5 (pid: 1037) of binary: /opt/conda/bin/python3.7",
Could you give me some hints?
Thanks.
Hi, when i use the command "python -m torch.distributed.run --nproc_per_node=8 pretrain.py --config ./configs/Pretrain.yaml --output_dir output/Pretrain ". It shows "ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: -9) local_rank: 5 (pid: 1037) of binary: /opt/conda/bin/python3.7", Could you give me some hints? Thanks.