smy20011 / dreambooth-gui

MIT License
364 stars 18 forks source link

RTX 3060, 12GB, 11.34gb free - out of memory error #78

Open TutajITeraz opened 1 year ago

TutajITeraz commented 1 year ago

Describe the bug

I'm getting an Out of memory error, no matter which settings i'm trying to use.

v0.1.10: Pulling from smy20011/dreambooth
Digest: sha256:3dfafdabd665dc1eeef59e3a455cc7792f6f4168628fac80ca85ded4543dd2a8
Status: Image is up to date for smy20011/dreambooth:v0.1.10

Fetching 12 files:   0%|          | 0/12 [00:00<?, ?it/s]
Fetching 12 files: 100%|██████████| 12/12 [00:00<00:00, 18675.94it/s]

Generating class images:   0%|          | 0/25 [00:00<?, ?it/s]
Generating class images:   0%|          | 0/25 [01:18<?, ?it/s]
Traceback (most recent call last):
  File "/train_dreambooth.py", line 822, in <module>
    main(args)
  File "/train_dreambooth.py", line 475, in main
    images = pipeline(example["prompt"]).images
  File "/opt/conda/lib/python3.7/site-packages/torch/autograd/grad_mode.py", line 27, in decorate_context
    return func(*args, **kwargs)
  File "/opt/conda/lib/python3.7/site-packages/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion.py", line 547, in __call__
    image = self.decode_latents(latents)
  File "/opt/conda/lib/python3.7/site-packages/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion.py", line 355, in decode_latents
    image = self.vae.decode(latents).sample
  File "/opt/conda/lib/python3.7/site-packages/diffusers/models/vae.py", line 581, in decode
    dec = self.decoder(z)
  File "/opt/conda/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
    return forward_call(*input, **kwargs)
  File "/opt/conda/lib/python3.7/site-packages/diffusers/models/vae.py", line 217, in forward
    sample = up_block(sample)
  File "/opt/conda/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
    return forward_call(*input, **kwargs)
  File "/opt/conda/lib/python3.7/site-packages/diffusers/models/unet_2d_blocks.py", line 1383, in forward
    hidden_states = resnet(hidden_states, temb=None)
  File "/opt/conda/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
    return forward_call(*input, **kwargs)
  File "/opt/conda/lib/python3.7/site-packages/diffusers/models/resnet.py", line 450, in forward
    hidden_states = self.norm1(hidden_states)
  File "/opt/conda/lib/python3.7/site-packages/torch/nn/modules/module.py", line 1130, in _call_impl
    return forward_call(*input, **kwargs)
  File "/opt/conda/lib/python3.7/site-packages/torch/nn/modules/normalization.py", line 273, in forward
    input, self.num_groups, self.weight, self.bias, self.eps)
  File "/opt/conda/lib/python3.7/site-packages/torch/nn/functional.py", line 2516, in group_norm
    return torch.group_norm(input, num_groups, weight, bias, eps, torch.backends.cudnn.enabled)
RuntimeError: CUDA out of memory. Tried to allocate 2.25 GiB (GPU 0; 11.75 GiB total capacity; 8.05 GiB already allocated; 113.25 MiB free; 9.83 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation.  See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

I have turned off all the apps, browser, and turned off external screens to free as much VRAM i can.

To Reproduce My settings are:

docker run --rm -t --pull always --gpus=all -v=/home/lukasz/Developer/dreambooth-gui/beks_512:/instance -v=/root/.config/smy20011.dreambooth/surreal oil painting by:/class -v=/run/media/lukasz/DATA ExFAT/Developer/dreambooth-gui/by_Beks:/output -v=/root/.config/smy20011.dreambooth/:/train -e HUGGING_FACE_HUB_TOKEN=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 smy20011/dreambooth:v0.1.10 /start_training /train_dreambooth.py --pretrained_model_name_or_path=stabilityai/stable-diffusion-2 --instance_prompt=by Beks --instance_data_dir=/instance --class_data_dir=/class --with_prior_preservation --prior_loss_weight=1.0 --class_prompt=surreal oil painting by --max_train_steps=16000 --learning_rate=5e-6 --lr_scheduler=constant --lr_warmup_steps=0 --save_interval=2000 --save_min_steps=3000 --resolution=512 --output_dir=/output --mixed_precision=fp16 --train_batch_size=1 --gradient_accumulation_steps=1 --use_8bit_adam --resolution=512 --gradient_checkpointing

Desktop (please complete the following information):

AbyszOne commented 1 year ago

--gradient_checkpointing?

TutajITeraz commented 1 year ago

i had turned it on when i was trying to perform train_text_encoder, but without it error is the same.

smy20011 commented 1 year ago

I don't think you are able to train text encoder with 12Gb mem. I'm happy to test it if you have a working command line.

TutajITeraz commented 1 year ago

I hope someone here colud help me with providing any settings that should work on 12GB without oom error?

I tried standard settings from latest release, without any changes, and it does not work either.

AbyszOne commented 1 year ago

I hope someone here colud help me with providing any settings that should work on 12GB without oom error?

I tried standard settings from latest release, without any changes, and it does not work either.

I always train encoder with my 3060. You just need to go to your task manager and end any program with GPU usage (if possible). Close all browsers ofc. You need every inch of VRam, but it wrks fine, for me at least. Dont forget gradient parameter.

AbyszOne commented 1 year ago

I don't think you are able to train text encoder with 12Gb mem. I'm happy to test it if you have a working command line.

Is possible. I always do that, in ur GUI and even in Auto's GUI. Just a matter of clean VRam.

TutajITeraz commented 1 year ago

So is 11.34gb not enough? How can I free more VRAM?

AbyszOne commented 1 year ago

So is 11.34gb not enough? How can I free more VRAM?

Already said. Close everything but the GUI. Also, look at the task manager to see wich sub process use GPU.

smy20011 commented 1 year ago

It runs fine. I have a similar setup. Do you mind run nvtop to make sure you have 11+ vram before running the GUI?

GUI only detect available mem once, If you open something after starting the GUI, it may use wrong parameters.

TutajITeraz commented 1 year ago

I will put nvtop output this afternoon. As i recall it was around 11.7 GB free with lower screen resolution, gnome-classic, only gui and console opened.

I think the real problem is here:

Tried to allocate 2.25 GiB (GPU 0; 11.75 GiB total capacity; 8.05 GiB already allocated; 113.25 MiB free; 9.83 GiB reserved in total by PyTorch

If i add 2.25 to 9.83 already reserved = 12.08 GiB so it is impossible to satisfy on rtx 3060 ? Or should i add only already alocated 8.05 to 2.25, and then 10.3 GiB should not be a problem. It does not add up, or my calculations are wrong.

futurevessel commented 1 year ago

I'm on 12gb VRAM (RTX 3060), and I can train with the text encoder, it's tight though as it uses ~96% of the available VRAM, but it works. I'm on Arch Linux, using i3 as the window manager. My settings are as follows:

--mixed_precision=fp16 --train_batch_size=1 --gradient_accumulation_steps=1 --use_8bit_adam --resolution=512 --gradient_checkpointing --train_text_encoder --seed=96576 --num_class_images=1000