Closed tin2tin closed 7 months ago
Is there a chance that you did not enable "Gradient Checkpointing" ? Because under
['C:\Users\User_name\Downloads\simple-lora-dreambooth-trainer-main\simple-lora-dreambooth-trainer-main\venv\Scripts\python.exe', 'C:\Users\User_name\Downloads\simple-lora-dreambooth-trainer-main\simple-lora-dreambooth-trainer-main\train_dreambooth_lora.py', '--pretrained_model_name_or_path', 'C:/Users/User_name/.cache/huggingface/hub/models--runwayml--stable-diffusion-v1-5/snapshots/1d0c4ebf6ff58a5caecab40fa1406526bca4b5b9', '--instance_data_dir', 'C:/Users/User_name/Documents/LORA/W', '--instance_prompt', 'WH1', '--class_prompt', 'W Herzog', '--output_dir', 'C:/Users/User_name/Documents/LORA/W Output', '--resolution', '512', '--train_batch_size', '1', '--num_train_epochs', '96', '--checkpointing_steps', '32', '--gradient_accumulation_steps', '1', '--learning_rate', '0.0001', '--lr_scheduler', 'constant_with_warmup', '--lr_warmup_steps', '10', '--mixed_precision', 'fp16', '--prior_generation_precision', 'fp16', '--rank', '4', '--use_8bit_adam', '--enable_xformers_memory_efficient_attention', '--pre_compute_text_embeddings']
I don't see that option enabled. It is highly recommended to enable that option, especially for a GPU with 6GB VRAM.
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 20.00 MiB (GPU 0; 6.00 GiB total capacity; 5.22 GiB already allocated; 0 bytes free; 5.28 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
Steps: 100%|███████████████████████████████████████████████| 1152/1152 [42:22<00:00, 2.21s/it, loss=0.0296, lr=0.0001]
['C:\\Users\\user_name\\Downloads\\simple-lora-dreambooth-trainer-main\\simple-lora-dreambooth-trainer-main\\venv\\Scripts\\python.exe', 'C:\\Users\\user_name\\Downloads\\simple-lora-dreambooth-trainer-main\\simple-lora-dreambooth-trainer-main\\train_dreambooth_lora.py', '--pretrained_model_name_or_path', 'C:/Users/user_name/.cache/huggingface/hub/models--runwayml--stable-diffusion-v1-5/snapshots/1d0c4ebf6ff58a5caecab40fa1406526bca4b5b9', '--instance_data_dir', 'C:/Users/user_name/Documents/LORA/cat', '--instance_prompt', 'WH1', '--class_prompt', 'cat', '--output_dir', 'C:/Users/user_name/Documents/LORA/Werner Out', '--resolution', '512', '--train_batch_size', '1', '--num_train_epochs', '64', '--checkpointing_steps', '32', '--gradient_accumulation_steps', '1', '--learning_rate', '0.0001', '--lr_scheduler', 'constant', '--lr_warmup_steps', '10', '--mixed_precision', 'fp16', '--prior_generation_precision', 'fp16', '--rank', '4', '--gradient_checkpointing', '--use_8bit_adam', '--enable_xformers_memory_efficient_attention', '--pre_compute_text_embeddings']
What is the step missing? I seem to be able to use the safetensors in the checkpoints with Diffusers?
I was able to re-create the issue, it looks like there is a massive VRAM-spike at the end of training, it shoots up from like 4-5GB to 7-8GB. I am going to see if I can fix this issue somehow, but this is a common problem with Stable Diffusion, even generating images in e.g. A1111/ComfyUI causes VRAM-spikes at the end of generation. So there is a chance that this is an "internal" issue e.g. with PyTorch/CUDA, which is something I won't be able to fix.
EDIT: I released an update and I think it is fixed now. Ran it myself and there are no more spikes at the end of training. Thank you for bringing this up.
Great. That fixed that out of mem. error in the end of the processing. Mindblowing to be able to do Loras on 6 GB of VRAM. Thank you!
That safetensor file outside the checkpoints is that supposed to be the "resulting" file? When I test it I typically get NSFW warning and something completely abstract rendered(trained on portraits and prompt: a portrait of "the word"):
However, testing the checkpoints for 10 images the sweet-spot seemed to be around checkpoint 640 and for 20 images it was around 256.
Great. That fixed that out of mem. error in the end of the processing. Mindblowing to be able to do Loras on 6 GB of VRAM. Thank you!
That safetensor file outside the checkpoints is that supposed to be the "resulting" file? When I test it I typically get NSFW warning and something completely abstract rendered(trained on portraits and prompt: a portrait of "the word"):
However, testing the checkpoints for 10 images the sweet-spot seemed to be around checkpoint 640 and for 20 images it was around 256.
I am not familiar with the interface that you are using, maybe the LoRA is not compatible. Looking at your previous reply, you have trained using the instance + class prompt: "WH1 W Herzog". The class token should be a single keyword/token like "house"/"cat"/"style"/"illustration", I would leave out the "W". Maybe that is what is causing those weird images.