Out of mem on 6GB VRAM - last step?

tin2tin commented 9 months ago

Num processes: 1
Process index: 0
Local process index: 0
Device: cuda

Mixed precision type: fp16

You are using a model of type clip_text_model to instantiate a model of type . This is not supported for all configurations of models and can yield errors.
{'clip_sample_range', 'timestep_spacing', 'variance_type', 'sample_max_value', 'prediction_type', 'dynamic_thresholding_ratio', 'thresholding'} was not found in config. Values will be initialized to default values.
{'scaling_factor', 'force_upcast'} was not found in config. Values will be initialized to default values.
{'encoder_hid_dim', 'encoder_hid_dim_type', 'resnet_out_scale_factor', 'only_cross_attention', 'use_linear_projection', 'time_embedding_dim', 'mid_block_type', 'transformer_layers_per_block', 'projection_class_embeddings_input_dim', 'class_embed_type', 'addition_time_embed_dim', 'upcast_attention', 'dual_cross_attention', 'time_embedding_act_fn', 'dropout', 'class_embeddings_concat', 'mid_block_only_cross_attention', 'resnet_time_scale_shift', 'attention_type', 'timestep_post_act', 'addition_embed_type', 'num_attention_heads', 'conv_out_kernel', 'addition_embed_type_num_heads', 'num_class_embeds', 'resnet_skip_time_act', 'cross_attention_norm', 'time_embedding_type', 'conv_in_kernel', 'time_cond_proj_dim'} was not found in config. Values will be initialized to default values.
10/20/2023 15:39:56 - WARNING - xformers - A matching Triton is not available, some optimizations will not be enabled.
Error caught was: No module named 'triton'

===================================BUG REPORT===================================
Welcome to bitsandbytes. For bug reports, please submit your error trace to: https://github.com/TimDettmers/bitsandbytes/issues
================================================================================
binary_path: C:\Users\User_name\Downloads\simple-lora-dreambooth-trainer-main\simple-lora-dreambooth-trainer-main\venv\lib\site-packages\bitsandbytes\cuda_setup\libbitsandbytes_cuda116.dll
CUDA SETUP: Loading binary C:\Users\User_name\Downloads\simple-lora-dreambooth-trainer-main\simple-lora-dreambooth-trainer-main\venv\lib\site-packages\bitsandbytes\cuda_setup\libbitsandbytes_cuda116.dll...
10/20/2023 15:39:57 - INFO - __main__ - ***** Running training *****
10/20/2023 15:39:57 - INFO - __main__ -   Num examples = 9
10/20/2023 15:39:57 - INFO - __main__ -   Num batches each epoch = 9
10/20/2023 15:39:57 - INFO - __main__ -   Num Epochs = 96
10/20/2023 15:39:57 - INFO - __main__ -   Instantaneous batch size per device = 1
10/20/2023 15:39:57 - INFO - __main__ -   Total train batch size (w. parallel, distributed & accumulation) = 1
10/20/2023 15:39:57 - INFO - __main__ -   Gradient Accumulation steps = 1
10/20/2023 15:39:57 - INFO - __main__ -   Total optimization steps = 864
Steps:   0%|                                                                                   | 0/864 [00:00<?, ?it/s]C:\Users\User_name\Downloads\simple-lora-dreambooth-trainer-main\simple-lora-dreambooth-trainer-main\venv\lib\site-packages\diffusers\models\attention_processor.py:1468: FutureWarning: `LoRAAttnProcessor2_0` is deprecated and will be removed in version 0.26.0. Make sure use AttnProcessor2_0 instead by settingLoRA layers to `self.{to_q,to_k,to_v,to_out[0]}.lora_layer` respectively. This will be done automatically when using `LoraLoaderMixin.load_lora_weights`
  deprecate(
Steps:   4%|█▉                                                 | 32/864 [00:22<08:28,  1.64it/s, loss=0.225, lr=0.0001]10/20/2023 15:40:19 - INFO - accelerate.accelerator - Saving current state to C:/Users/User_name/Documents/LORA/W Output\checkpoint-32
Model weights saved in C:/Users/User_name/Documents/LORA/W Output\checkpoint-32\pytorch_lora_weights.safetensors
10/20/2023 15:40:19 - INFO - accelerate.checkpointing - Optimizer state saved in C:/Users/User_name/Documents/LORA/W Output\checkpoint-32\optimizer.bin
10/20/2023 15:40:19 - INFO - accelerate.checkpointing - Scheduler state saved in C:/Users/User_name/Documents/LORA/W Output\checkpoint-32\scheduler.bin
10/20/2023 15:40:19 - INFO - accelerate.checkpointing - Gradient scaler state saved in C:/Users/User_name/Documents/LORA/W Output\checkpoint-32\scaler.pt
10/20/2023 15:40:19 - INFO - accelerate.checkpointing - Random states saved in C:/Users/User_name/Documents/LORA/W Output\checkpoint-32\random_states_0.pkl
10/20/2023 15:40:19 - INFO - __main__ - Saved state to C:/Users/User_name/Documents/LORA/W Output\checkpoint-32
Steps:   7%|███▊                                               | 64/864 [00:42<08:33,  1.56it/s, loss=0.161, lr=0.0001]10/20/2023 15:40:39 - INFO - accelerate.accelerator - Saving current state to C:/Users/User_name/Documents/LORA/W Output\checkpoint-64
Model weights saved in C:/Users/User_name/Documents/LORA/W Output\checkpoint-64\pytorch_lora_weights.safetensors
10/20/2023 15:40:39 - INFO - accelerate.checkpointing - Optimizer state saved in C:/Users/User_name/Documents/LORA/W Output\checkpoint-64\optimizer.bin
10/20/2023 15:40:39 - INFO - accelerate.checkpointing - Scheduler state saved in C:/Users/User_name/Documents/LORA/W Output\checkpoint-64\scheduler.bin
10/20/2023 15:40:39 - INFO - accelerate.checkpointing - Gradient scaler state saved in C:/Users/User_name/Documents/LORA/W Output\checkpoint-64\scaler.pt
10/20/2023 15:40:39 - INFO - accelerate.checkpointing - Random states saved in C:/Users/User_name/Documents/LORA/W Output\checkpoint-64\random_states_0.pkl
10/20/2023 15:40:39 - INFO - __main__ - Saved state to C:/Users/User_name/Documents/LORA/W Output\checkpoint-64
Steps:  11%|█████▍                                           | 96/864 [01:05<09:05,  1.41it/s, loss=0.00209, lr=0.0001]10/20/2023 15:41:03 - INFO - accelerate.accelerator - Saving current state to C:/Users/User_name/Documents/LORA/W Output\checkpoint-96
Model weights saved in C:/Users/User_name/Documents/LORA/W Output\checkpoint-96\pytorch_lora_weights.safetensors
10/20/2023 15:41:03 - INFO - accelerate.checkpointing - Optimizer state saved in C:/Users/User_name/Documents/LORA/W Output\checkpoint-96\optimizer.bin
10/20/2023 15:41:03 - INFO - accelerate.checkpointing - Scheduler state saved in C:/Users/User_name/Documents/LORA/W Output\checkpoint-96\scheduler.bin
10/20/2023 15:41:03 - INFO - accelerate.checkpointing - Gradient scaler state saved in C:/Users/User_name/Documents/LORA/W Output\checkpoint-96\scaler.pt
10/20/2023 15:41:03 - INFO - accelerate.checkpointing - Random states saved in C:/Users/User_name/Documents/LORA/W Output\checkpoint-96\random_states_0.pkl
10/20/2023 15:41:03 - INFO - __main__ - Saved state to C:/Users/User_name/Documents/LORA/W Output\checkpoint-96
Steps:  15%|███████▍                                          | 128/864 [01:32<10:35,  1.16it/s, loss=0.114, lr=0.0001]10/20/2023 15:41:29 - INFO - accelerate.accelerator - Saving current state to C:/Users/User_name/Documents/LORA/W Output\checkpoint-128
Model weights saved in C:/Users/User_name/Documents/LORA/W Output\checkpoint-128\pytorch_lora_weights.safetensors
10/20/2023 15:41:30 - INFO - accelerate.checkpointing - Optimizer state saved in C:/Users/User_name/Documents/LORA/W Output\checkpoint-128\optimizer.bin
10/20/2023 15:41:30 - INFO - accelerate.checkpointing - Scheduler state saved in C:/Users/User_name/Documents/LORA/W Output\checkpoint-128\scheduler.bin
10/20/2023 15:41:30 - INFO - accelerate.checkpointing - Gradient scaler state saved in C:/Users/User_name/Documents/LORA/W Output\checkpoint-128\scaler.pt
10/20/2023 15:41:30 - INFO - accelerate.checkpointing - Random states saved in C:/Users/User_name/Documents/LORA/W Output\checkpoint-128\random_states_0.pkl
10/20/2023 15:41:30 - INFO - __main__ - Saved state to C:/Users/User_name/Documents/LORA/W Output\checkpoint-128
Steps:  19%|████████▉                                       | 160/864 [02:03<12:35,  1.07s/it, loss=0.00257, lr=0.0001]10/20/2023 15:42:00 - INFO - accelerate.accelerator - Saving current state to C:/Users/User_name/Documents/LORA/W Output\checkpoint-160
Model weights saved in C:/Users/User_name/Documents/LORA/W Output\checkpoint-160\pytorch_lora_weights.safetensors
10/20/2023 15:42:01 - INFO - accelerate.checkpointing - Optimizer state saved in C:/Users/User_name/Documents/LORA/W Output\checkpoint-160\optimizer.bin
10/20/2023 15:42:01 - INFO - accelerate.checkpointing - Scheduler state saved in C:/Users/User_name/Documents/LORA/W Output\checkpoint-160\scheduler.bin
10/20/2023 15:42:01 - INFO - accelerate.checkpointing - Gradient scaler state saved in C:/Users/User_name/Documents/LORA/W Output\checkpoint-160\scaler.pt
10/20/2023 15:42:01 - INFO - accelerate.checkpointing - Random states saved in C:/Users/User_name/Documents/LORA/W Output\checkpoint-160\random_states_0.pkl
10/20/2023 15:42:01 - INFO - __main__ - Saved state to C:/Users/User_name/Documents/LORA/W Output\checkpoint-160
Steps:  22%|██████████▉                                      | 192/864 [02:37<13:03,  1.17s/it, loss=0.0156, lr=0.0001]10/20/2023 15:42:35 - INFO - accelerate.accelerator - Saving current state to C:/Users/User_name/Documents/LORA/W Output\checkpoint-192
Model weights saved in C:/Users/User_name/Documents/LORA/W Output\checkpoint-192\pytorch_lora_weights.safetensors
10/20/2023 15:42:35 - INFO - accelerate.checkpointing - Optimizer state saved in C:/Users/User_name/Documents/LORA/W Output\checkpoint-192\optimizer.bin
10/20/2023 15:42:35 - INFO - accelerate.checkpointing - Scheduler state saved in C:/Users/User_name/Documents/LORA/W Output\checkpoint-192\scheduler.bin
10/20/2023 15:42:35 - INFO - accelerate.checkpointing - Gradient scaler state saved in C:/Users/User_name/Documents/LORA/W Output\checkpoint-192\scaler.pt
10/20/2023 15:42:35 - INFO - accelerate.checkpointing - Random states saved in C:/Users/User_name/Documents/LORA/W Output\checkpoint-192\random_states_0.pkl
10/20/2023 15:42:35 - INFO - __main__ - Saved state to C:/Users/User_name/Documents/LORA/W Output\checkpoint-192
Steps:  26%|████████████▋                                    | 224/864 [03:16<13:04,  1.23s/it, loss=0.0667, lr=0.0001]10/20/2023 15:43:14 - INFO - accelerate.accelerator - Saving current state to C:/Users/User_name/Documents/LORA/W Output\checkpoint-224
Model weights saved in C:/Users/User_name/Documents/LORA/W Output\checkpoint-224\pytorch_lora_weights.safetensors
10/20/2023 15:43:14 - INFO - accelerate.checkpointing - Optimizer state saved in C:/Users/User_name/Documents/LORA/W Output\checkpoint-224\optimizer.bin
10/20/2023 15:43:14 - INFO - accelerate.checkpointing - Scheduler state saved in C:/Users/User_name/Documents/LORA/W Output\checkpoint-224\scheduler.bin
10/20/2023 15:43:14 - INFO - accelerate.checkpointing - Gradient scaler state saved in C:/Users/User_name/Documents/LORA/W Output\checkpoint-224\scaler.pt
10/20/2023 15:43:14 - INFO - accelerate.checkpointing - Random states saved in C:/Users/User_name/Documents/LORA/W Output\checkpoint-224\random_states_0.pkl
10/20/2023 15:43:14 - INFO - __main__ - Saved state to C:/Users/User_name/Documents/LORA/W Output\checkpoint-224
Steps:  30%|██████████████▌                                  | 256/864 [03:58<13:25,  1.33s/it, loss=0.0855, lr=0.0001]10/20/2023 15:43:56 - INFO - accelerate.accelerator - Saving current state to C:/Users/User_name/Documents/LORA/W Output\checkpoint-256
Model weights saved in C:/Users/User_name/Documents/LORA/W Output\checkpoint-256\pytorch_lora_weights.safetensors
10/20/2023 15:43:56 - INFO - accelerate.checkpointing - Optimizer state saved in C:/Users/User_name/Documents/LORA/W Output\checkpoint-256\optimizer.bin
10/20/2023 15:43:56 - INFO - accelerate.checkpointing - Scheduler state saved in C:/Users/User_name/Documents/LORA/W Output\checkpoint-256\scheduler.bin
10/20/2023 15:43:56 - INFO - accelerate.checkpointing - Gradient scaler state saved in C:/Users/User_name/Documents/LORA/W Output\checkpoint-256\scaler.pt
10/20/2023 15:43:56 - INFO - accelerate.checkpointing - Random states saved in C:/Users/User_name/Documents/LORA/W Output\checkpoint-256\random_states_0.pkl
10/20/2023 15:43:56 - INFO - __main__ - Saved state to C:/Users/User_name/Documents/LORA/W Output\checkpoint-256
Steps:  33%|████████████████▋                                 | 288/864 [04:43<12:17,  1.28s/it, loss=0.472, lr=0.0001]10/20/2023 15:44:40 - INFO - accelerate.accelerator - Saving current state to C:/Users/User_name/Documents/LORA/W Output\checkpoint-288
Model weights saved in C:/Users/User_name/Documents/LORA/W Output\checkpoint-288\pytorch_lora_weights.safetensors
10/20/2023 15:44:40 - INFO - accelerate.checkpointing - Optimizer state saved in C:/Users/User_name/Documents/LORA/W Output\checkpoint-288\optimizer.bin
10/20/2023 15:44:40 - INFO - accelerate.checkpointing - Scheduler state saved in C:/Users/User_name/Documents/LORA/W Output\checkpoint-288\scheduler.bin
10/20/2023 15:44:40 - INFO - accelerate.checkpointing - Gradient scaler state saved in C:/Users/User_name/Documents/LORA/W Output\checkpoint-288\scaler.pt
10/20/2023 15:44:40 - INFO - accelerate.checkpointing - Random states saved in C:/Users/User_name/Documents/LORA/W Output\checkpoint-288\random_states_0.pkl
10/20/2023 15:44:40 - INFO - __main__ - Saved state to C:/Users/User_name/Documents/LORA/W Output\checkpoint-288
Steps:  37%|██████████████████▌                               | 320/864 [05:25<15:27,  1.70s/it, loss=0.245, lr=0.0001]10/20/2023 15:45:23 - INFO - accelerate.accelerator - Saving current state to C:/Users/User_name/Documents/LORA/W Output\checkpoint-320
Model weights saved in C:/Users/User_name/Documents/LORA/W Output\checkpoint-320\pytorch_lora_weights.safetensors
10/20/2023 15:45:23 - INFO - accelerate.checkpointing - Optimizer state saved in C:/Users/User_name/Documents/LORA/W Output\checkpoint-320\optimizer.bin
10/20/2023 15:45:23 - INFO - accelerate.checkpointing - Scheduler state saved in C:/Users/User_name/Documents/LORA/W Output\checkpoint-320\scheduler.bin
10/20/2023 15:45:23 - INFO - accelerate.checkpointing - Gradient scaler state saved in C:/Users/User_name/Documents/LORA/W Output\checkpoint-320\scaler.pt
10/20/2023 15:45:23 - INFO - accelerate.checkpointing - Random states saved in C:/Users/User_name/Documents/LORA/W Output\checkpoint-320\random_states_0.pkl
10/20/2023 15:45:23 - INFO - __main__ - Saved state to C:/Users/User_name/Documents/LORA/W Output\checkpoint-320
Steps:  41%|███████████████████▉                             | 352/864 [06:27<16:33,  1.94s/it, loss=0.0187, lr=0.0001]10/20/2023 15:46:25 - INFO - accelerate.accelerator - Saving current state to C:/Users/User_name/Documents/LORA/W Output\checkpoint-352
Model weights saved in C:/Users/User_name/Documents/LORA/W Output\checkpoint-352\pytorch_lora_weights.safetensors
10/20/2023 15:46:25 - INFO - accelerate.checkpointing - Optimizer state saved in C:/Users/User_name/Documents/LORA/W Output\checkpoint-352\optimizer.bin
10/20/2023 15:46:25 - INFO - accelerate.checkpointing - Scheduler state saved in C:/Users/User_name/Documents/LORA/W Output\checkpoint-352\scheduler.bin
10/20/2023 15:46:25 - INFO - accelerate.checkpointing - Gradient scaler state saved in C:/Users/User_name/Documents/LORA/W Output\checkpoint-352\scaler.pt
10/20/2023 15:46:25 - INFO - accelerate.checkpointing - Random states saved in C:/Users/User_name/Documents/LORA/W Output\checkpoint-352\random_states_0.pkl
10/20/2023 15:46:25 - INFO - __main__ - Saved state to C:/Users/User_name/Documents/LORA/W Output\checkpoint-352
Steps:  44%|██████████████████████▏                           | 384/864 [07:54<21:50,  2.73s/it, loss=0.049, lr=0.0001]10/20/2023 15:47:52 - INFO - accelerate.accelerator - Saving current state to C:/Users/User_name/Documents/LORA/W Output\checkpoint-384
Model weights saved in C:/Users/User_name/Documents/LORA/W Output\checkpoint-384\pytorch_lora_weights.safetensors
10/20/2023 15:47:52 - INFO - accelerate.checkpointing - Optimizer state saved in C:/Users/User_name/Documents/LORA/W Output\checkpoint-384\optimizer.bin
10/20/2023 15:47:52 - INFO - accelerate.checkpointing - Scheduler state saved in C:/Users/User_name/Documents/LORA/W Output\checkpoint-384\scheduler.bin
10/20/2023 15:47:52 - INFO - accelerate.checkpointing - Gradient scaler state saved in C:/Users/User_name/Documents/LORA/W Output\checkpoint-384\scaler.pt
10/20/2023 15:47:52 - INFO - accelerate.checkpointing - Random states saved in C:/Users/User_name/Documents/LORA/W Output\checkpoint-384\random_states_0.pkl
10/20/2023 15:47:52 - INFO - __main__ - Saved state to C:/Users/User_name/Documents/LORA/W Output\checkpoint-384
Steps:  48%|████████████████████████                          | 416/864 [08:54<11:05,  1.49s/it, loss=0.115, lr=0.0001]10/20/2023 15:48:52 - INFO - accelerate.accelerator - Saving current state to C:/Users/User_name/Documents/LORA/W Output\checkpoint-416
Model weights saved in C:/Users/User_name/Documents/LORA/W Output\checkpoint-416\pytorch_lora_weights.safetensors
10/20/2023 15:48:52 - INFO - accelerate.checkpointing - Optimizer state saved in C:/Users/User_name/Documents/LORA/W Output\checkpoint-416\optimizer.bin
10/20/2023 15:48:52 - INFO - accelerate.checkpointing - Scheduler state saved in C:/Users/User_name/Documents/LORA/W Output\checkpoint-416\scheduler.bin
10/20/2023 15:48:52 - INFO - accelerate.checkpointing - Gradient scaler state saved in C:/Users/User_name/Documents/LORA/W Output\checkpoint-416\scaler.pt
10/20/2023 15:48:52 - INFO - accelerate.checkpointing - Random states saved in C:/Users/User_name/Documents/LORA/W Output\checkpoint-416\random_states_0.pkl
10/20/2023 15:48:52 - INFO - __main__ - Saved state to C:/Users/User_name/Documents/LORA/W Output\checkpoint-416
Steps:  52%|█████████████████████████▉                        | 448/864 [09:57<15:29,  2.23s/it, loss=0.253, lr=0.0001]10/20/2023 15:49:55 - INFO - accelerate.accelerator - Saving current state to C:/Users/User_name/Documents/LORA/W Output\checkpoint-448
Model weights saved in C:/Users/User_name/Documents/LORA/W Output\checkpoint-448\pytorch_lora_weights.safetensors
10/20/2023 15:49:55 - INFO - accelerate.checkpointing - Optimizer state saved in C:/Users/User_name/Documents/LORA/W Output\checkpoint-448\optimizer.bin
10/20/2023 15:49:55 - INFO - accelerate.checkpointing - Scheduler state saved in C:/Users/User_name/Documents/LORA/W Output\checkpoint-448\scheduler.bin
10/20/2023 15:49:55 - INFO - accelerate.checkpointing - Gradient scaler state saved in C:/Users/User_name/Documents/LORA/W Output\checkpoint-448\scaler.pt
10/20/2023 15:49:55 - INFO - accelerate.checkpointing - Random states saved in C:/Users/User_name/Documents/LORA/W Output\checkpoint-448\random_states_0.pkl
10/20/2023 15:49:55 - INFO - __main__ - Saved state to C:/Users/User_name/Documents/LORA/W Output\checkpoint-448
Steps:  56%|███████████████████████████▏                     | 480/864 [11:15<21:17,  3.33s/it, loss=0.0522, lr=0.0001]10/20/2023 15:51:13 - INFO - accelerate.accelerator - Saving current state to C:/Users/User_name/Documents/LORA/W Output\checkpoint-480
Model weights saved in C:/Users/User_name/Documents/LORA/W Output\checkpoint-480\pytorch_lora_weights.safetensors
10/20/2023 15:51:13 - INFO - accelerate.checkpointing - Optimizer state saved in C:/Users/User_name/Documents/LORA/W Output\checkpoint-480\optimizer.bin
10/20/2023 15:51:13 - INFO - accelerate.checkpointing - Scheduler state saved in C:/Users/User_name/Documents/LORA/W Output\checkpoint-480\scheduler.bin
10/20/2023 15:51:13 - INFO - accelerate.checkpointing - Gradient scaler state saved in C:/Users/User_name/Documents/LORA/W Output\checkpoint-480\scaler.pt
10/20/2023 15:51:13 - INFO - accelerate.checkpointing - Random states saved in C:/Users/User_name/Documents/LORA/W Output\checkpoint-480\random_states_0.pkl
10/20/2023 15:51:13 - INFO - __main__ - Saved state to C:/Users/User_name/Documents/LORA/W Output\checkpoint-480
Steps:  59%|████████████████████████████▍                   | 512/864 [12:49<14:08,  2.41s/it, loss=0.00314, lr=0.0001]10/20/2023 15:52:46 - INFO - accelerate.accelerator - Saving current state to C:/Users/User_name/Documents/LORA/W Output\checkpoint-512
Model weights saved in C:/Users/User_name/Documents/LORA/W Output\checkpoint-512\pytorch_lora_weights.safetensors
10/20/2023 15:52:47 - INFO - accelerate.checkpointing - Optimizer state saved in C:/Users/User_name/Documents/LORA/W Output\checkpoint-512\optimizer.bin
10/20/2023 15:52:47 - INFO - accelerate.checkpointing - Scheduler state saved in C:/Users/User_name/Documents/LORA/W Output\checkpoint-512\scheduler.bin
10/20/2023 15:52:47 - INFO - accelerate.checkpointing - Gradient scaler state saved in C:/Users/User_name/Documents/LORA/W Output\checkpoint-512\scaler.pt
10/20/2023 15:52:47 - INFO - accelerate.checkpointing - Random states saved in C:/Users/User_name/Documents/LORA/W Output\checkpoint-512\random_states_0.pkl
10/20/2023 15:52:47 - INFO - __main__ - Saved state to C:/Users/User_name/Documents/LORA/W Output\checkpoint-512
Steps:  63%|██████████████████████████████▏                 | 544/864 [14:23<10:22,  1.94s/it, loss=0.00518, lr=0.0001]10/20/2023 15:54:21 - INFO - accelerate.accelerator - Saving current state to C:/Users/User_name/Documents/LORA/W Output\checkpoint-544
Model weights saved in C:/Users/User_name/Documents/LORA/W Output\checkpoint-544\pytorch_lora_weights.safetensors
10/20/2023 15:54:21 - INFO - accelerate.checkpointing - Optimizer state saved in C:/Users/User_name/Documents/LORA/W Output\checkpoint-544\optimizer.bin
10/20/2023 15:54:21 - INFO - accelerate.checkpointing - Scheduler state saved in C:/Users/User_name/Documents/LORA/W Output\checkpoint-544\scheduler.bin
10/20/2023 15:54:21 - INFO - accelerate.checkpointing - Gradient scaler state saved in C:/Users/User_name/Documents/LORA/W Output\checkpoint-544\scaler.pt
10/20/2023 15:54:21 - INFO - accelerate.checkpointing - Random states saved in C:/Users/User_name/Documents/LORA/W Output\checkpoint-544\random_states_0.pkl
10/20/2023 15:54:21 - INFO - __main__ - Saved state to C:/Users/User_name/Documents/LORA/W Output\checkpoint-544
Steps:  67%|█████████████████████████████████▎                | 576/864 [15:29<14:06,  2.94s/it, loss=0.262, lr=0.0001]10/20/2023 15:55:26 - INFO - accelerate.accelerator - Saving current state to C:/Users/User_name/Documents/LORA/W Output\checkpoint-576
Model weights saved in C:/Users/User_name/Documents/LORA/W Output\checkpoint-576\pytorch_lora_weights.safetensors
10/20/2023 15:55:27 - INFO - accelerate.checkpointing - Optimizer state saved in C:/Users/User_name/Documents/LORA/W Output\checkpoint-576\optimizer.bin
10/20/2023 15:55:27 - INFO - accelerate.checkpointing - Scheduler state saved in C:/Users/User_name/Documents/LORA/W Output\checkpoint-576\scheduler.bin
10/20/2023 15:55:27 - INFO - accelerate.checkpointing - Gradient scaler state saved in C:/Users/User_name/Documents/LORA/W Output\checkpoint-576\scaler.pt
10/20/2023 15:55:27 - INFO - accelerate.checkpointing - Random states saved in C:/Users/User_name/Documents/LORA/W Output\checkpoint-576\random_states_0.pkl
10/20/2023 15:55:27 - INFO - __main__ - Saved state to C:/Users/User_name/Documents/LORA/W Output\checkpoint-576
Steps:  70%|██████████████████████████████████▍              | 608/864 [17:50<19:15,  4.51s/it, loss=0.0717, lr=0.0001]10/20/2023 15:57:47 - INFO - accelerate.accelerator - Saving current state to C:/Users/User_name/Documents/LORA/W Output\checkpoint-608
Model weights saved in C:/Users/User_name/Documents/LORA/W Output\checkpoint-608\pytorch_lora_weights.safetensors
10/20/2023 15:57:48 - INFO - accelerate.checkpointing - Optimizer state saved in C:/Users/User_name/Documents/LORA/W Output\checkpoint-608\optimizer.bin
10/20/2023 15:57:48 - INFO - accelerate.checkpointing - Scheduler state saved in C:/Users/User_name/Documents/LORA/W Output\checkpoint-608\scheduler.bin
10/20/2023 15:57:48 - INFO - accelerate.checkpointing - Gradient scaler state saved in C:/Users/User_name/Documents/LORA/W Output\checkpoint-608\scaler.pt
10/20/2023 15:57:48 - INFO - accelerate.checkpointing - Random states saved in C:/Users/User_name/Documents/LORA/W Output\checkpoint-608\random_states_0.pkl
10/20/2023 15:57:48 - INFO - __main__ - Saved state to C:/Users/User_name/Documents/LORA/W Output\checkpoint-608
Steps:  74%|█████████████████████████████████████             | 640/864 [20:16<16:56,  4.54s/it, loss=0.405, lr=0.0001]10/20/2023 16:00:14 - INFO - accelerate.accelerator - Saving current state to C:/Users/User_name/Documents/LORA/W Output\checkpoint-640
Model weights saved in C:/Users/User_name/Documents/LORA/W Output\checkpoint-640\pytorch_lora_weights.safetensors
10/20/2023 16:00:14 - INFO - accelerate.checkpointing - Optimizer state saved in C:/Users/User_name/Documents/LORA/W Output\checkpoint-640\optimizer.bin
10/20/2023 16:00:14 - INFO - accelerate.checkpointing - Scheduler state saved in C:/Users/User_name/Documents/LORA/W Output\checkpoint-640\scheduler.bin
10/20/2023 16:00:14 - INFO - accelerate.checkpointing - Gradient scaler state saved in C:/Users/User_name/Documents/LORA/W Output\checkpoint-640\scaler.pt
10/20/2023 16:00:14 - INFO - accelerate.checkpointing - Random states saved in C:/Users/User_name/Documents/LORA/W Output\checkpoint-640\random_states_0.pkl
10/20/2023 16:00:14 - INFO - __main__ - Saved state to C:/Users/User_name/Documents/LORA/W Output\checkpoint-640
Steps:  78%|█████████████████████████████████████▎          | 672/864 [22:48<15:25,  4.82s/it, loss=0.00579, lr=0.0001]10/20/2023 16:02:46 - INFO - accelerate.accelerator - Saving current state to C:/Users/User_name/Documents/LORA/W Output\checkpoint-672
Model weights saved in C:/Users/User_name/Documents/LORA/W Output\checkpoint-672\pytorch_lora_weights.safetensors
10/20/2023 16:02:46 - INFO - accelerate.checkpointing - Optimizer state saved in C:/Users/User_name/Documents/LORA/W Output\checkpoint-672\optimizer.bin
10/20/2023 16:02:46 - INFO - accelerate.checkpointing - Scheduler state saved in C:/Users/User_name/Documents/LORA/W Output\checkpoint-672\scheduler.bin
10/20/2023 16:02:46 - INFO - accelerate.checkpointing - Gradient scaler state saved in C:/Users/User_name/Documents/LORA/W Output\checkpoint-672\scaler.pt
10/20/2023 16:02:46 - INFO - accelerate.checkpointing - Random states saved in C:/Users/User_name/Documents/LORA/W Output\checkpoint-672\random_states_0.pkl
10/20/2023 16:02:46 - INFO - __main__ - Saved state to C:/Users/User_name/Documents/LORA/W Output\checkpoint-672
Steps:  81%|███████████████████████████████████████         | 704/864 [25:19<12:46,  4.79s/it, loss=0.00259, lr=0.0001]10/20/2023 16:05:17 - INFO - accelerate.accelerator - Saving current state to C:/Users/User_name/Documents/LORA/W Output\checkpoint-704
Model weights saved in C:/Users/User_name/Documents/LORA/W Output\checkpoint-704\pytorch_lora_weights.safetensors
10/20/2023 16:05:17 - INFO - accelerate.checkpointing - Optimizer state saved in C:/Users/User_name/Documents/LORA/W Output\checkpoint-704\optimizer.bin
10/20/2023 16:05:17 - INFO - accelerate.checkpointing - Scheduler state saved in C:/Users/User_name/Documents/LORA/W Output\checkpoint-704\scheduler.bin
10/20/2023 16:05:17 - INFO - accelerate.checkpointing - Gradient scaler state saved in C:/Users/User_name/Documents/LORA/W Output\checkpoint-704\scaler.pt
10/20/2023 16:05:17 - INFO - accelerate.checkpointing - Random states saved in C:/Users/User_name/Documents/LORA/W Output\checkpoint-704\random_states_0.pkl
10/20/2023 16:05:17 - INFO - __main__ - Saved state to C:/Users/User_name/Documents/LORA/W Output\checkpoint-704
Steps:  85%|█████████████████████████████████████████▋       | 736/864 [27:45<10:01,  4.70s/it, loss=0.0431, lr=0.0001]10/20/2023 16:07:43 - INFO - accelerate.accelerator - Saving current state to C:/Users/User_name/Documents/LORA/W Output\checkpoint-736
Model weights saved in C:/Users/User_name/Documents/LORA/W Output\checkpoint-736\pytorch_lora_weights.safetensors
10/20/2023 16:07:43 - INFO - accelerate.checkpointing - Optimizer state saved in C:/Users/User_name/Documents/LORA/W Output\checkpoint-736\optimizer.bin
10/20/2023 16:07:43 - INFO - accelerate.checkpointing - Scheduler state saved in C:/Users/User_name/Documents/LORA/W Output\checkpoint-736\scheduler.bin
10/20/2023 16:07:43 - INFO - accelerate.checkpointing - Gradient scaler state saved in C:/Users/User_name/Documents/LORA/W Output\checkpoint-736\scaler.pt
10/20/2023 16:07:43 - INFO - accelerate.checkpointing - Random states saved in C:/Users/User_name/Documents/LORA/W Output\checkpoint-736\random_states_0.pkl
10/20/2023 16:07:43 - INFO - __main__ - Saved state to C:/Users/User_name/Documents/LORA/W Output\checkpoint-736
Steps:  89%|██████████████████████████████████████████▋     | 768/864 [30:10<07:32,  4.71s/it, loss=0.00614, lr=0.0001]10/20/2023 16:10:08 - INFO - accelerate.accelerator - Saving current state to C:/Users/User_name/Documents/LORA/W Output\checkpoint-768
Model weights saved in C:/Users/User_name/Documents/LORA/W Output\checkpoint-768\pytorch_lora_weights.safetensors
10/20/2023 16:10:08 - INFO - accelerate.checkpointing - Optimizer state saved in C:/Users/User_name/Documents/LORA/W Output\checkpoint-768\optimizer.bin
10/20/2023 16:10:08 - INFO - accelerate.checkpointing - Scheduler state saved in C:/Users/User_name/Documents/LORA/W Output\checkpoint-768\scheduler.bin
10/20/2023 16:10:08 - INFO - accelerate.checkpointing - Gradient scaler state saved in C:/Users/User_name/Documents/LORA/W Output\checkpoint-768\scaler.pt
10/20/2023 16:10:08 - INFO - accelerate.checkpointing - Random states saved in C:/Users/User_name/Documents/LORA/W Output\checkpoint-768\random_states_0.pkl
10/20/2023 16:10:08 - INFO - __main__ - Saved state to C:/Users/User_name/Documents/LORA/W Output\checkpoint-768
Steps:  93%|█████████████████████████████████████████████▎   | 800/864 [32:34<04:51,  4.56s/it, loss=0.0491, lr=0.0001]10/20/2023 16:12:32 - INFO - accelerate.accelerator - Saving current state to C:/Users/User_name/Documents/LORA/W Output\checkpoint-800
Model weights saved in C:/Users/User_name/Documents/LORA/W Output\checkpoint-800\pytorch_lora_weights.safetensors
10/20/2023 16:12:32 - INFO - accelerate.checkpointing - Optimizer state saved in C:/Users/User_name/Documents/LORA/W Output\checkpoint-800\optimizer.bin
10/20/2023 16:12:32 - INFO - accelerate.checkpointing - Scheduler state saved in C:/Users/User_name/Documents/LORA/W Output\checkpoint-800\scheduler.bin
10/20/2023 16:12:32 - INFO - accelerate.checkpointing - Gradient scaler state saved in C:/Users/User_name/Documents/LORA/W Output\checkpoint-800\scaler.pt
10/20/2023 16:12:32 - INFO - accelerate.checkpointing - Random states saved in C:/Users/User_name/Documents/LORA/W Output\checkpoint-800\random_states_0.pkl
10/20/2023 16:12:32 - INFO - __main__ - Saved state to C:/Users/User_name/Documents/LORA/W Output\checkpoint-800
Steps:  96%|████████████████████████████████████████████████▏ | 832/864 [35:00<02:19,  4.36s/it, loss=0.157, lr=0.0001]10/20/2023 16:14:57 - INFO - accelerate.accelerator - Saving current state to C:/Users/User_name/Documents/LORA/W Output\checkpoint-832
Model weights saved in C:/Users/User_name/Documents/LORA/W Output\checkpoint-832\pytorch_lora_weights.safetensors
10/20/2023 16:14:57 - INFO - accelerate.checkpointing - Optimizer state saved in C:/Users/User_name/Documents/LORA/W Output\checkpoint-832\optimizer.bin
10/20/2023 16:14:57 - INFO - accelerate.checkpointing - Scheduler state saved in C:/Users/User_name/Documents/LORA/W Output\checkpoint-832\scheduler.bin
10/20/2023 16:14:57 - INFO - accelerate.checkpointing - Gradient scaler state saved in C:/Users/User_name/Documents/LORA/W Output\checkpoint-832\scaler.pt
10/20/2023 16:14:57 - INFO - accelerate.checkpointing - Random states saved in C:/Users/User_name/Documents/LORA/W Output\checkpoint-832\random_states_0.pkl
10/20/2023 16:14:57 - INFO - __main__ - Saved state to C:/Users/User_name/Documents/LORA/W Output\checkpoint-832
Steps: 100%|█████████████████████████████████████████████████| 864/864 [37:26<00:00,  4.55s/it, loss=0.0691, lr=0.0001]10/20/2023 16:17:24 - INFO - accelerate.accelerator - Saving current state to C:/Users/User_name/Documents/LORA/W Output\checkpoint-864
Model weights saved in C:/Users/User_name/Documents/LORA/W Output\checkpoint-864\pytorch_lora_weights.safetensors
10/20/2023 16:17:24 - INFO - accelerate.checkpointing - Optimizer state saved in C:/Users/User_name/Documents/LORA/W Output\checkpoint-864\optimizer.bin
10/20/2023 16:17:24 - INFO - accelerate.checkpointing - Scheduler state saved in C:/Users/User_name/Documents/LORA/W Output\checkpoint-864\scheduler.bin
10/20/2023 16:17:24 - INFO - accelerate.checkpointing - Gradient scaler state saved in C:/Users/User_name/Documents/LORA/W Output\checkpoint-864\scaler.pt
10/20/2023 16:17:24 - INFO - accelerate.checkpointing - Random states saved in C:/Users/User_name/Documents/LORA/W Output\checkpoint-864\random_states_0.pkl
10/20/2023 16:17:24 - INFO - __main__ - Saved state to C:/Users/User_name/Documents/LORA/W Output\checkpoint-864
Steps: 100%|████████████████████████████████████████████████████| 864/864 [37:26<00:00,  4.55s/it, loss=0.1, lr=0.0001]Model weights saved in C:/Users/User_name/Documents/LORA/W Output\pytorch_lora_weights.safetensors
{'requires_safety_checker'} was not found in config. Values will be initialized to default values.
                                                                                                                       {'encoder_hid_dim', 'encoder_hid_dim_type', 'resnet_out_scale_factor', 'only_cross_attention', 'use_linear_projection', 'time_embedding_dim', 'mid_block_type', 'transformer_layers_per_block', 'projection_class_embeddings_input_dim', 'class_embed_type', 'addition_time_embed_dim', 'upcast_attention', 'dual_cross_attention', 'time_embedding_act_fn', 'dropout', 'class_embeddings_concat', 'mid_block_only_cross_attention', 'resnet_time_scale_shift', 'attention_type', 'timestep_post_act', 'addition_embed_type', 'num_attention_heads', 'conv_out_kernel', 'addition_embed_type_num_heads', 'num_class_embeds', 'resnet_skip_time_act', 'cross_attention_norm', 'time_embedding_type', 'conv_in_kernel', 'time_cond_proj_dim'} was not found in config. Values will be initialized to default values.
Loaded unet as UNet2DConditionModel from `unet` subfolder of C:/Users/User_name/.cache/huggingface/hub/models--runwayml--stable-diffusion-v1-5/snapshots/1d0c4ebf6ff58a5caecab40fa1406526bca4b5b9.
                                                                                                                       Loaded feature_extractor as CLIPImageProcessor from `feature_extractor` subfolder of C:/Users/User_name/.cache/huggingface/hub/models--runwayml--stable-diffusion-v1-5/snapshots/1d0c4ebf6ff58a5caecab40fa1406526bca4b5b9.
                                                                                                                       {'timestep_spacing', 'prediction_type'} was not found in config. Values will be initialized to default values. 5.10s/it]
Loaded scheduler as PNDMScheduler from `scheduler` subfolder of C:/Users/User_name/.cache/huggingface/hub/models--runwayml--stable-diffusion-v1-5/snapshots/1d0c4ebf6ff58a5caecab40fa1406526bca4b5b9.
Loaded text_encoder as CLIPTextModel from `text_encoder` subfolder of C:/Users/User_name/.cache/huggingface/hub/models--runwayml--stable-diffusion-v1-5/snapshots/1d0c4ebf6ff58a5caecab40fa1406526bca4b5b9.
                                                                                                                       `text_config_dict` is provided which will be used to initialize `CLIPTextConfig`. The value `text_config["id2label"]` will be overriden.
`text_config_dict` is provided which will be used to initialize `CLIPTextConfig`. The value `text_config["bos_token_id"]` will be overriden.
`text_config_dict` is provided which will be used to initialize `CLIPTextConfig`. The value `text_config["eos_token_id"]` will be overriden.
Loaded safety_checker as StableDiffusionSafetyChecker from `safety_checker` subfolder of C:/Users/User_name/.cache/huggingface/hub/models--runwayml--stable-diffusion-v1-5/snapshots/1d0c4ebf6ff58a5caecab40fa1406526bca4b5b9.
                                                                                                                       Loaded tokenizer as CLIPTokenizer from `tokenizer` subfolder of C:/Users/User_name/.cache/huggingface/hub/models--runwayml--stable-diffusion-v1-5/snapshots/1d0c4ebf6ff58a5caecab40fa1406526bca4b5b9.
                                                                                                                       {'scaling_factor', 'force_upcast'} was not found in config. Values will be initialized to default values.:01,  1.80s/it]
Loaded vae as AutoencoderKL from `vae` subfolder of C:/Users/User_name/.cache/huggingface/hub/models--runwayml--stable-diffusion-v1-5/snapshots/1d0c4ebf6ff58a5caecab40fa1406526bca4b5b9.
Loading pipeline components...: 100%|████████████████████████████████████████████████████| 7/7 [00:17<00:00,  2.48s/it]
{'solver_type', 'solver_order', 'timestep_spacing', 'variance_type', 'lower_order_final', 'algorithm_type', 'sample_max_value', 'prediction_type', 'dynamic_thresholding_ratio', 'lambda_min_clipped', 'use_karras_sigmas', 'thresholding'} was not found in config. Values will be initialized to default values.
Traceback (most recent call last):
  File "C:\Users\User_name\Downloads\simple-lora-dreambooth-trainer-main\simple-lora-dreambooth-trainer-main\train_dreambooth_lora.py", line 1425, in <module>
    main(args)
  File "C:\Users\User_name\Downloads\simple-lora-dreambooth-trainer-main\simple-lora-dreambooth-trainer-main\train_dreambooth_lora.py", line 1375, in main
    pipeline = pipeline.to(accelerator.device)
  File "C:\Users\User_name\Downloads\simple-lora-dreambooth-trainer-main\simple-lora-dreambooth-trainer-main\venv\lib\site-packages\diffusers\pipelines\pipeline_utils.py", line 733, in to
    module.to(torch_device, torch_dtype)
  File "C:\Users\User_name\Downloads\simple-lora-dreambooth-trainer-main\simple-lora-dreambooth-trainer-main\venv\lib\site-packages\transformers\modeling_utils.py", line 2179, in to
    return super().to(*args, **kwargs)
  File "C:\Users\User_name\Downloads\simple-lora-dreambooth-trainer-main\simple-lora-dreambooth-trainer-main\venv\lib\site-packages\torch\nn\modules\module.py", line 1145, in to
    return self._apply(convert)
  File "C:\Users\User_name\Downloads\simple-lora-dreambooth-trainer-main\simple-lora-dreambooth-trainer-main\venv\lib\site-packages\torch\nn\modules\module.py", line 797, in _apply
    module._apply(fn)
  File "C:\Users\User_name\Downloads\simple-lora-dreambooth-trainer-main\simple-lora-dreambooth-trainer-main\venv\lib\site-packages\torch\nn\modules\module.py", line 797, in _apply
    module._apply(fn)
  File "C:\Users\User_name\Downloads\simple-lora-dreambooth-trainer-main\simple-lora-dreambooth-trainer-main\venv\lib\site-packages\torch\nn\modules\module.py", line 797, in _apply
    module._apply(fn)
  [Previous line repeated 3 more times]
  File "C:\Users\User_name\Downloads\simple-lora-dreambooth-trainer-main\simple-lora-dreambooth-trainer-main\venv\lib\site-packages\torch\nn\modules\module.py", line 820, in _apply
    param_applied = fn(param)
  File "C:\Users\User_name\Downloads\simple-lora-dreambooth-trainer-main\simple-lora-dreambooth-trainer-main\venv\lib\site-packages\torch\nn\modules\module.py", line 1143, in convert
    return t.to(device, dtype if t.is_floating_point() or t.is_complex() else None, non_blocking)
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 20.00 MiB (GPU 0; 6.00 GiB total capacity; 5.22 GiB already allocated; 0 bytes free; 5.27 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation.  See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
Steps: 100%|████████████████████████████████████████████████████| 864/864 [37:47<00:00,  2.62s/it, loss=0.1, lr=0.0001]
['C:\\Users\\User_name\\Downloads\\simple-lora-dreambooth-trainer-main\\simple-lora-dreambooth-trainer-main\\venv\\Scripts\\python.exe', 'C:\\Users\\User_name\\Downloads\\simple-lora-dreambooth-trainer-main\\simple-lora-dreambooth-trainer-main\\train_dreambooth_lora.py', '--pretrained_model_name_or_path', 'C:/Users/User_name/.cache/huggingface/hub/models--runwayml--stable-diffusion-v1-5/snapshots/1d0c4ebf6ff58a5caecab40fa1406526bca4b5b9', '--instance_data_dir', 'C:/Users/User_name/Documents/LORA/W', '--instance_prompt', 'WH1', '--class_prompt', 'W Herzog', '--output_dir', 'C:/Users/User_name/Documents/LORA/W Output', '--resolution', '512', '--train_batch_size', '1', '--num_train_epochs', '96', '--checkpointing_steps', '32', '--gradient_accumulation_steps', '1', '--learning_rate', '0.0001', '--lr_scheduler', 'constant_with_warmup', '--lr_warmup_steps', '10', '--mixed_precision', 'fp16', '--prior_generation_precision', 'fp16', '--rank', '4', '--use_8bit_adam', '--enable_xformers_memory_efficient_attention', '--pre_compute_text_embeddings']

johnman3032 commented 9 months ago

Is there a chance that you did not enable "Gradient Checkpointing" ? Because under

['C:\Users\User_name\Downloads\simple-lora-dreambooth-trainer-main\simple-lora-dreambooth-trainer-main\venv\Scripts\python.exe', 'C:\Users\User_name\Downloads\simple-lora-dreambooth-trainer-main\simple-lora-dreambooth-trainer-main\train_dreambooth_lora.py', '--pretrained_model_name_or_path', 'C:/Users/User_name/.cache/huggingface/hub/models--runwayml--stable-diffusion-v1-5/snapshots/1d0c4ebf6ff58a5caecab40fa1406526bca4b5b9', '--instance_data_dir', 'C:/Users/User_name/Documents/LORA/W', '--instance_prompt', 'WH1', '--class_prompt', 'W Herzog', '--output_dir', 'C:/Users/User_name/Documents/LORA/W Output', '--resolution', '512', '--train_batch_size', '1', '--num_train_epochs', '96', '--checkpointing_steps', '32', '--gradient_accumulation_steps', '1', '--learning_rate', '0.0001', '--lr_scheduler', 'constant_with_warmup', '--lr_warmup_steps', '10', '--mixed_precision', 'fp16', '--prior_generation_precision', 'fp16', '--rank', '4', '--use_8bit_adam', '--enable_xformers_memory_efficient_attention', '--pre_compute_text_embeddings']

I don't see that option enabled. It is highly recommended to enable that option, especially for a GPU with 6GB VRAM.

tin2tin commented 9 months ago

torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 20.00 MiB (GPU 0; 6.00 GiB total capacity; 5.22 GiB already allocated; 0 bytes free; 5.28 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation.  See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
Steps: 100%|███████████████████████████████████████████████| 1152/1152 [42:22<00:00,  2.21s/it, loss=0.0296, lr=0.0001]
['C:\\Users\\user_name\\Downloads\\simple-lora-dreambooth-trainer-main\\simple-lora-dreambooth-trainer-main\\venv\\Scripts\\python.exe', 'C:\\Users\\user_name\\Downloads\\simple-lora-dreambooth-trainer-main\\simple-lora-dreambooth-trainer-main\\train_dreambooth_lora.py', '--pretrained_model_name_or_path', 'C:/Users/user_name/.cache/huggingface/hub/models--runwayml--stable-diffusion-v1-5/snapshots/1d0c4ebf6ff58a5caecab40fa1406526bca4b5b9', '--instance_data_dir', 'C:/Users/user_name/Documents/LORA/cat', '--instance_prompt', 'WH1', '--class_prompt', 'cat', '--output_dir', 'C:/Users/user_name/Documents/LORA/Werner Out', '--resolution', '512', '--train_batch_size', '1', '--num_train_epochs', '64', '--checkpointing_steps', '32', '--gradient_accumulation_steps', '1', '--learning_rate', '0.0001', '--lr_scheduler', 'constant', '--lr_warmup_steps', '10', '--mixed_precision', 'fp16', '--prior_generation_precision', 'fp16', '--rank', '4', '--gradient_checkpointing', '--use_8bit_adam', '--enable_xformers_memory_efficient_attention', '--pre_compute_text_embeddings']

What is the step missing? I seem to be able to use the safetensors in the checkpoints with Diffusers?

johnman3032 commented 9 months ago

I was able to re-create the issue, it looks like there is a massive VRAM-spike at the end of training, it shoots up from like 4-5GB to 7-8GB. I am going to see if I can fix this issue somehow, but this is a common problem with Stable Diffusion, even generating images in e.g. A1111/ComfyUI causes VRAM-spikes at the end of generation. So there is a chance that this is an "internal" issue e.g. with PyTorch/CUDA, which is something I won't be able to fix.

EDIT: I released an update and I think it is fixed now. Ran it myself and there are no more spikes at the end of training. Thank you for bringing this up.

tin2tin commented 9 months ago

Great. That fixed that out of mem. error in the end of the processing. Mindblowing to be able to do Loras on 6 GB of VRAM. Thank you!

That safetensor file outside the checkpoints is that supposed to be the "resulting" file? When I test it I typically get NSFW warning and something completely abstract rendered(trained on portraits and prompt: a portrait of "the word"):

However, testing the checkpoints for 10 images the sweet-spot seemed to be around checkpoint 640 and for 20 images it was around 256.

johnman3032 commented 9 months ago

Great. That fixed that out of mem. error in the end of the processing. Mindblowing to be able to do Loras on 6 GB of VRAM. Thank you!

That safetensor file outside the checkpoints is that supposed to be the "resulting" file? When I test it I typically get NSFW warning and something completely abstract rendered(trained on portraits and prompt: a portrait of "the word"):

However, testing the checkpoints for 10 images the sweet-spot seemed to be around checkpoint 640 and for 20 images it was around 256.

I am not familiar with the interface that you are using, maybe the LoRA is not compatible. Looking at your previous reply, you have trained using the instance + class prompt: "WH1 W Herzog". The class token should be a single keyword/token like "house"/"cat"/"style"/"illustration", I would leave out the "W". Maybe that is what is causing those weird images.

johnman3032 / simple-lora-dreambooth-trainer

Out of mem on 6GB VRAM - last step? #2