Closed lvjiujin closed 2 years ago
Hi,
I am pretty sure that it can be run on one single GPU with my scripts. I am sorry that I don't know why it doesn't work in your case. Can you show me your errors? I can not ensure that I can solve your errors since I don't use them for a long time.
Hi,
I am pretty sure that it can be run on one single GPU with my scripts. I am sorry that I don't know why it doesn't work in your case. Can you show me your errors? I can not ensure that I can solve your errors since I don't use them for a long time.
I know why I occur the error, it shows "Runtime Error: Address already in use" because the master_port I used is my cloud master port ,so it occurs errors. Now, I know, the master_port can be any port ,if the port is not used, so I have solved my problem , thank you very much.
First, I want to say, I execute your code when I don't use ddp, it is ok, but When I use ddp, it can't work, I only have one GPU. I noticed that in your shell scripts:
CUDA_VISIBLE_DEVICES=5 python3 -m torch.distributed.launch --master_port 13517 --nproc_per_node=1
I know this is a ddp way to run the GPUs training.So, Can I use the single Gpu to train the model with ddp, why are you ok?