Closed jy12he closed 1 year ago
i think you can use the model to generate pictures, but training the model needs much bigger memory . I tried to use one rtx4090 to train this model following the github and got the error 'CUDA out of memory'.
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 256.00 MiB (GPU 0; 23.65 GiB total capacity; 21.73 GiB already allocated; 101.81 MiB free; 21.77 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
@0546trigger Hi, for 512x512 resolution, training dynamic diffusers, I set batch size as 8 samples per GPU, and it takes 12GB GPU memory. You can surely reduce the batch size to avoid the CUDA OOM problem.
OK, I see, thank you ! if I also set batch size as 8 samples per GPU , how much memory will consume per GPU? @ziqihuangg
@0546trigger Hi, for 512x512 resolution, training dynamic diffusers, I set batch size as 8 samples per GPU. You can surely reduce the batch size to avoid the CUDA OOM problem.
do you main to reduce the max_images in config files? I set 4 for max_images and then got following error
RuntimeError: indices should be either on cpu or on the same device as the indexed tensor (cpu)
@jy12he Hi, it takes 12GB GPU memory. The setting is: 512x512 resolution, training dynamic diffusers, batch size = 8 samples per GPU. If your RTX3090 has 24GB memory, there should be no problem training at the 512x512 resolution.
@0546trigger You simply need to modify the parameter batch_size
in the config.
thanks for your patience, i reduce the batch_size to 1 with 512*512 resolution and it works. It takes about 18GB memory to do this training on my rtx4090, it's much larger than the number you said above. Did I do something wrong while installation or setting parameters ?
@0546trigger Which model are you training? Which config file did you use?
the vae model with default config except batchsize=1
@0546trigger For 512x512 VAE training, batch_size = 2 works fine on a 32 GB GPU. Each sample takes around 16 GB.
The setting I previously mentioned was "it takes 12GB GPU memory. The setting is: 512x512 resolution, training dynamic diffusers, batch size = 8 samples per GPU." Hope this clarifies, thanks.
Hi,how much GPU memory is required,can I run on RTX3090 ?