openai / consistency_models

Official repo for consistency models.
MIT License
6.08k stars 411 forks source link

Use one gpu to generate images using a pretrained model without the communication protocol nccl. #21

Open stonecropa opened 1 year ago

stonecropa commented 1 year ago

I only have one gpu, and I want to successfully run the pre-trained model and generate images, what should I do. Where should the code be changed? Please explain in detail, because I am a newcomera in this regard. Thanks.

thorinf commented 1 year ago

Make sure your commands aren't starting with mpiexec -n 8 like some of the scripts suggest. This is a multi-GPU command.

stonecropa commented 1 year ago

@thorinf
I know it but it need nccl,even it doesn't have mpiexec.Thank you.

thorinf commented 1 year ago

Which command are you using? I'll take a look

stonecropa commented 1 year ago

mpiexec -n 8 python cm_train.py --training_mode consistency_distillation --sigma_max 80 --sigma_min 0.002 --target_ema_mode fixed --start_ema 0.95 --scale_mode fixed --start_scales 40 --total_training_steps 600000 --loss_norm lpips --lr_anneal_steps 0 --teacher_model_path /path/to/edm_bedroom256_ema.pt --attention_resolutions 32,16,8 --class_cond False --use_scale_shift_norm False --dropout 0.0 --teacher_dropout 0.1 --ema_rate 0.9999,0.99994,0.9999432189950708 --global_batch_size 256 --image_size 256 --lr 0.00001 --num_channels 256 --num_head_channels 64 --num_res_blocks 2 --resblock_updown True --schedule_sampler uniform --use_fp16 True --weight_decay 0.0 --weight_schedule uniform --data_dir /path/to/bedroom256

thorinf commented 1 year ago

And you've tried this without mpiexec -n 8? Just python cm_train.py....etc.

stonecropa commented 1 year ago

yes

stonecropa commented 1 year ago

so I change the code but it has a bug

thorinf commented 1 year ago

What happens if you do dist.get_world_size()? How many GPUs does the machine you are using have? You can also check this with nvidia-smi in terminal.

thorinf commented 1 year ago

https://github.com/openai/consistency_models/blob/6d26080c58244555c031dbc63080c0961af74200/cm/train_util.py#L98

Here in the training loop is where distributed training is selected. It seems to activate assuming CUDA is available, rather than CUDA && Multi-GPU. I am unsure whether DDP is happy to work with just a single GPU; maybe it does, maybe it doesn't. You could try changing this line to force it into the else condition.

I would still check if the machine has multi-GPU. I know if this is a personal machine it may be obvious that it's just single-GPU, so it seems stupid for me to suggest the check. But if it's a server somewhere then might be worth just taking a look, could be surprised to see more than one.

https://github.com/openai/consistency_models/blob/6d26080c58244555c031dbc63080c0961af74200/cm/train_util.py#L71

The global batch size is here, if you have a single GPU machine it should be the same as the batch size. If you make the change I suggested above then you will also need to make sure the global batch size is the same as the batch size.

stonecropa commented 1 year ago

@thorinf I changed it but also need nccl image image image

thorinf commented 1 year ago

You need to change the if th.cuda.is_available():, here you are only change the attribute and DDP is still used.

You could try changing the backend on DDP to not use NCCL.

Or just install NCCL, even though you haven't got multi-GPU it may still work.

stonecropa commented 1 year ago

@thorinf Thanks, I will try it according to your suggestion, but I still have two questions, the first one is how do I specify the location of the datasets, I just want to generate pictures, I don’t want to download the original version, it’s too big. The second problem is that there is an error here. My approach is to put the cm folder under the scripts folder. Is there any solution? image

thorinf commented 1 year ago

Image sample doesn't require a dataset from what I can see. It would be odd for it to be required.

thorinf commented 1 year ago

If you look at the above code you sent. In the else condition the ddp_model is set to be the original model, I think this is what you want. The DDP wrapper is what uses distributed training. For distributed training, NCCL is the Nvidia protocol for multi-GPU communication. You need to avoid DDP to avoid NCCL.

However, I am surprised that DDP checks for NCCL even if only 1 GPU is being used. The code could proceed, it doesn't need the RunTimeError. This is not something you can change though.

1999kevin commented 1 year ago

https://github.com/openai/consistency_models/blob/6d26080c58244555c031dbc63080c0961af74200/cm/train_util.py#L98

Here in the training loop is where distributed training is selected. It seems to activate assuming CUDA is available, rather than CUDA && Multi-GPU. I am unsure whether DDP is happy to work with just a single GPU; maybe it does, maybe it doesn't. You could try changing this line to force it into the else condition.

I would still check if the machine has multi-GPU. I know if this is a personal machine it may be obvious that it's just single-GPU, so it seems stupid for me to suggest the check. But if it's a server somewhere then might be worth just taking a look, could be surprised to see more than one.

https://github.com/openai/consistency_models/blob/6d26080c58244555c031dbc63080c0961af74200/cm/train_util.py#L71

The global batch size is here, if you have a single GPU machine it should be the same as the batch size. If you make the change I suggested above then you will also need to make sure the global batch size is the same as the batch size.

Hi, I'm also curious about how to run the code in a workstation with multiple GPUs. Simply deleting mpiexec -n 8 will run the code in one single GPU, as I mentioned in the issue #20 . Ths setting is in cm/dist_util.py, but I'm not familar about mpi4py. Do you have any ideas or advices?

thorinf commented 1 year ago

Simply deleting mpiexec -n 8 will run the code in one single GPU, as I mentioned in the issue #20 .

I think the issue is that without NCCL installed the DDP is throwing a RunTimeError.

but I'm not familar about mpi4py. Do you have any ideas or advices?

There's also torch.distributed.launch and torchrun which may work.

1999kevin commented 1 year ago

There's also torch.distributed.launch and torchrun which may work.

Are there any easier methods? For example, any minor adjustments on mpi4py in cm/dist_util.py?

thorinf commented 1 year ago

mpiexec or mpirun are pretty simple, I would definitely recommend learning to use them or trying to use the other launchers I mentioned above. You basically need to have something which runs multiple instances of the code and sets up communication between them.

1999kevin commented 1 year ago

Thanks for your recommendation. I will try it in a while!

nekoshadow1 commented 1 year ago

@thorinf Thanks, I will try it according to your suggestion, but I still have two questions, the first one is how do I specify the location of the datasets, I just want to generate pictures, I don’t want to download the original version, it’s too big. The second problem is that there is an error here. My approach is to put the cm folder under the scripts folder. Is there any solution? image

You need to run setup.py first in order to configure everything correctly. Or as the author suggests: pip install -e .

Note that this can take ~30mins.

nekoshadow1 commented 1 year ago

@thorinf Thanks, I will try it according to your suggestion, but I still have two questions, the first one is how do I specify the location of the datasets, I just want to generate pictures, I don’t want to download the original version, it’s too big. The second problem is that there is an error here. My approach is to put the cm folder under the scripts folder. Is there any solution? image

You need to run setup.py first in order to configure everything correctly. Or as the author suggests: pip install -e .

Note that this can take ~30mins.

If you only want to generate images without downloading the large dataset (LSUN training set contains > 1million images btw), you can refer to 'Multistep sampling on class-conditional ImageNet-64, and LSUN 256' part in the following file: https://github.com/openai/consistency_models/blob/main/scripts/launch.sh

After running the setup.py, download the pretrained models provided by the author and specify the path to the pretrained models when you run the sampling commands.

stonecropa commented 1 year ago

@thorinf
Thank you, I am running the pre-training model on my laptop, the system is win10, can I also install nccl?

stonecropa commented 1 year ago

@thorinf Thanks, I will try it according to your suggestion, but I still have two questions, the first one is how do I specify the location of the datasets, I just want to generate pictures, I don’t want to download the original version, it’s too big. The second problem is that there is an error here. My approach is to put the cm folder under the scripts folder. Is there any solution? image

You need to run setup.py first in order to configure everything correctly. Or as the author suggests: pip install -e . Note that this can take ~30mins.

If you only want to generate images without downloading the large dataset (LSUN training set contains > 1million images btw), you can refer to 'Multistep sampling on class-conditional ImageNet-64, and LSUN 256' part in the following file: https://github.com/openai/consistency_models/blob/main/scripts/launch.sh

After running the setup.py, download the pretrained models provided by the author and specify the path to the pretrained models when you run the sampling commands.

@nekoshadow1 I have already installed it, and there is no problem with cm, but I still need nccl. Is there any way? Thanks.

ShyFoo commented 1 year ago

mpiexec or mpirun are pretty simple, I would definitely recommend learning to use them or trying to use the other launchers I mentioned above. You basically need to have something which runs multiple instances of the code and sets up communication between them.

Hello, thorinf. Did you reproduce the results of the paper? I found many people said they got bad results

ChenSiyi1 commented 1 year ago

You can change the code of line 42 in the dist_util.py. dist.init_process_group(backend="gloo", init_method="env://")