single GPU can use ddp? (distributed data parallel)

liuwei1206 / LEBERT

Code for the ACL2021 paper "Lexicon Enhanced Chinese Sequence Labelling Using BERT Adapter"

336 stars 60 forks source link

single GPU can use ddp? (distributed data parallel) #33

Closed lvjiujin closed 2 years ago

lvjiujin commented 2 years ago

First, I want to say, I execute your code when I don't use ddp, it is ok, but When I use ddp, it can't work, I only have one GPU. I noticed that in your shell scripts:

CUDA_VISIBLE_DEVICES=5 python3 -m torch.distributed.launch --master_port 13517 --nproc_per_node=1 I know this is a ddp way to run the GPUs training.

So, Can I use the single Gpu to train the model with ddp, why are you ok?

liuwei1206 commented 2 years ago

Hi,

I am pretty sure that it can be run on one single GPU with my scripts. I am sorry that I don't know why it doesn't work in your case. Can you show me your errors? I can not ensure that I can solve your errors since I don't use them for a long time.

lvjiujin commented 2 years ago

Hi,

I am pretty sure that it can be run on one single GPU with my scripts. I am sorry that I don't know why it doesn't work in your case. Can you show me your errors? I can not ensure that I can solve your errors since I don't use them for a long time.

I know why I occur the error, it shows "Runtime Error: Address already in use" because the master_port I used is my cloud master port ,so it occurs errors. Now, I know, the master_port can be any port ,if the port is not used, so I have solved my problem , thank you very much.