How to run on a single linux server with multiple GPUs

openai / consistency_models

Official repo for consistency models.

MIT License

6.02k stars 409 forks source link

How to run on a single linux server with multiple GPUs #20

Open 1999kevin opened 1 year ago

1999kevin commented 1 year ago

Nice Job! I wonder how I can run the code on a single linux server with multiple GPUs. I can run the code on the server with one GPU by not using mpiexec. But what if I want to use multiple GPUs as nn.DataParallel?

stonecropa commented 1 year ago

@1999kevin Can you tell me how to use a gpu to generate images using a pretrained model without the communication protocol nccl. Thank you

1999kevin commented 1 year ago

@1999kevin Can you tell me how to use a gpu to generate images using a pretrained model without the communication protocol nccl. Thank you

Just delete the mpiexec part in the command of the sampling.

stonecropa commented 1 year ago

@1999kevin but in image_samping.py ，I don't find mpiexec.thanks

stonecropa commented 1 year ago

Can I have a look at the code after your changes, thanks, I would appreciate it if you could send it over

1999kevin commented 1 year ago

Can I have a look at the code after your changes, thanks, I would appreciate it if you could send it over

I'm still working on training phase and not so sure about the inference phase. I guess you can follow Line 48 and Line 51 in scripts/launch.sh to sample the images. If you want to use one thread, just directly use the command: python image_sample.py ...

tyshiwo1 commented 1 year ago

I add CUDA_VISIBLE_DEVICES=6,7 in front of the inference command to form CUDA_VISIBLE_DEVICES=6,7 mpiexec -n 2 python ./scripts/image_sample.py ..., and change the code of ./cm/dist_util.py#L27 into:

    if 'CUDA_VISIBLE_DEVICES' not in os.environ:
        os.environ["CUDA_VISIBLE_DEVICES"] = f"{MPI.COMM_WORLD.Get_rank() % GPUS_PER_NODE}"
    else:
        gpu_inds_list = os.environ["CUDA_VISIBLE_DEVICES"].split(',')
        idx = MPI.COMM_WORLD.Get_rank() % GPUS_PER_NODE
        os.environ["CUDA_VISIBLE_DEVICES"] = gpu_inds_list[idx]

Does it work?

1999kevin commented 1 year ago

Does it work?

I will test it once I finish current training.

tyshiwo1 commented 1 year ago

Does it work?

I will test it once I finish current training.

Btw, I found training with only 4 batchsize and 64 image size costs about 18G memory per GPU. Is there something wrong with it?

1999kevin commented 1 year ago

Btw, I found training with only 4 batchsize and 64 image size costs about 18G memory per GPU. Is there something wrong with it?

I also encounter simialr problems in my test. I train the model with batchsize 2 and 256 image size, costing me 35G memory.

stonecropa commented 1 year ago

Btw, I found training with only 4 batchsize and 64 image size costs about 18G memory per GPU. Is there something wrong with it?

I also encounter simialr problems in my test. I train the model with batchsize 2 and 256 image size, costing me 35G memory. Will the pre-training model also use such a large amount of Gpu memory?

1999kevin commented 1 year ago

Will the pre-training model also use such a large amount of Gpu memory?

Do not test such case currently.

1999kevin commented 1 year ago

I add CUDA_VISIBLE_DEVICES=6,7 in front of the inference command to form CUDA_VISIBLE_DEVICES=6,7 mpiexec -n 2 python ./scripts/image_sample.py ..., and change the code of ./cm/dist_util.py#L27

This change can definitely enable multiple GPUs training. However, it may cause error 'Expected q.stride(-1) == 1 to be true, but got false' as in issus #3. Change the flash attenion to defaclt can resolve the error