huggingface / diffusers

🤗 Diffusers: State-of-the-art diffusion models for image and audio generation in PyTorch and FLAX.
https://huggingface.co/docs/diffusers
Apache License 2.0
25.85k stars 5.33k forks source link

[Dreambooth example run error] When I use the downloaded pretrained_model I still get network errors #7276

Closed xdobetter closed 3 months ago

xdobetter commented 7 months ago

Hello,it's a great work!

I have downloaded the pre-trained model in advance. My question is why there are still network errors during the training process?

My windows system environment is as follows:

transformers version: 4.38.2
accelerate version: 0.27.2
peft version: 0.9.1.dev0
pytorch version: 2.2.1

The code I chose is dreambooth,The script I executed is as follows

accelerate launch train_dreambooth.py --pretrained_model_name_or_path="runwayml/stable-diffusion-v1-5"  --instance_data_dir="./data/dog" --output_dir="model-lora_fine-tune-dog-2" --instance_prompt="a photo of sks dog" --resolution=512 --train_batch_size=1 --gradient_accumulation_steps=1 --checkpointing_steps=100 --learning_rate=1e-4 --lr_scheduler="constant" --lr_warmup_steps=0 --max_train_steps=500 --validation_prompt="A photo of sks dog in the snow" --seed="0" --report_to="tensorboard" --use_lora --num_dataloader_workers=0 --no_tracemalloc --validation_steps=100

The pretrained-model "runwayml/stable-diffusion-v1-5" aleardy downloaded before train

The output error during the run is as follows

(zero123-hf-new) PS J:\Research_Program\peft\examples\lora_dreambooth> accelerate launch train_dreambooth.py --pretrained_model_name_or_path="runwayml/stable-diffusion-v1-5"  --instance_data_dir="./data/dog" --output_dir="model-lora_fine-tune-dog-2" --instance_prompt="a photo of sks dog" --resolution=512 --train_batch_size=1 --gradient_accumulation_steps=1 --checkpointing_steps=100 --learning_rate=1e-4 --lr_scheduler="constant" --lr_warmup_steps=0 --max_train_steps=500 --validation_prompt="A photo of sks dog in the snow" --seed="0" --report_to="tensorboard" --use_lora --num_dataloader_workers=0 --no_tracemalloc --validation_steps=20
D:\ProgramData\Miniconda3\envs\zero123-hf-new\lib\site-packages\scipy\__init__.py:155: UserWarning: A NumPy version >=1.18.5 and <1.25.0 is required for this version of SciPy (detected version 1.26.4
  warnings.warn(f"A NumPy version >={np_minversion} and <{np_maxversion}"
03/10/2024 19:46:58 - INFO - __main__ - Distributed environment: NO
Num processes: 1
Process index: 0
Local process index: 0
Device: cuda

Mixed precision type: no

You are using a model of type clip_text_model to instantiate a model of type . This is not supported for all configurations of models and can yield errors.
{'scaling_factor'} was not found in config. Values will be initialized to default values.
{'class_embed_type', 'resnet_time_scale_shift', 'transformer_layers_per_block', 'conv_out_kernel', 'addition_embed_type', 'class_embeddings_concat', 'num_class_embeds', 'upcast_attention', 'encoder_hid_dim', 'addition_time_embed_dim', 'mid_block_type', 'resnet_out_scale_factor', 'resnet_skip_time_act', 'use_linear_projection', 'dual_cross_attention', 'only_cross_attention', 'time_embedding_dim', 'time_embedding_type', 'addition_embed_type_num_heads', 'cross_attention_norm', 'encoder_hid_dim_type', 'conv_in_kernel', 'projection_class_embeddings_input_dim', 'time_embedding_act_fn', 'mid_block_only_cross_attention', 'timestep_post_act', 'num_attention_heads', 'time_cond_proj_dim'} was not found in config. Values will be initialized to default values.
trainable params: 797,184 || all params: 860,318,148 || trainable%: 0.09266153478840713
PeftModel(
  (base_model): LoraModel(
    (model): UNet2DConditionModel(
      (conv_in): Conv2d(4, 320, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
      (time_proj): Timesteps()
      (time_embedding): TimestepEmbedding(
        (linear_1): Linear(in_features=320, out_features=1280, bias=True)
        (act): SiLU()
        (linear_2): Linear(in_features=1280, out_features=1280, bias=True)
      )
      (down_blocks): ModuleList(
        (0): CrossAttnDownBlock2D(
          (attentions): ModuleList(
            (0-1): 2 x Transformer2DModel(
              (norm): GroupNorm(32, 320, eps=1e-06, affine=True)
              (proj_in): Conv2d(320, 320, kernel_size=(1, 1), stride=(1, 1))
              (transformer_blocks): ModuleList(
                (0): BasicTransformerBlock(
                  (norm1): LayerNorm((320,), eps=1e-05, elementwise_affine=True)
                  (attn1): Attention(
                    (to_q): lora.Linear(
                      (base_layer): Linear(in_features=320, out_features=320, bias=False)
                      (lora_dropout): ModuleDict(
                        (default): Identity()
                      )
                      (lora_A): ModuleDict(
                        (default): Linear(in_features=320, out_features=8, bias=False)
                      )
                      (lora_B): ModuleDict(
                        (default): Linear(in_features=8, out_features=320, bias=False)
                      )
                      (lora_embedding_A): ParameterDict()
                      (lora_embedding_B): ParameterDict()
                    )
                    (to_k): Linear(in_features=320, out_features=320, bias=False)
                    (to_v): lora.Linear(
                      (base_layer): Linear(in_features=320, out_features=320, bias=False)
                      (lora_dropout): ModuleDict(
                        (default): Identity()
                      )
                      (lora_A): ModuleDict(
                        (default): Linear(in_features=320, out_features=8, bias=False)
                      )
                      (lora_B): ModuleDict(
                        (default): Linear(in_features=8, out_features=320, bias=False)
                      )
                      (lora_embedding_A): ParameterDict()
                      (lora_embedding_B): ParameterDict()
                    )
                    (to_out): ModuleList(
                      (0): Linear(in_features=320, out_features=320, bias=True)
                      (1): Dropout(p=0.0, inplace=False)
                    )
                  )
                  (norm2): LayerNorm((320,), eps=1e-05, elementwise_affine=True)
                  (attn2): Attention(
                    (to_q): lora.Linear(
                      (base_layer): Linear(in_features=320, out_features=320, bias=False)
                      (lora_dropout): ModuleDict(
                        (default): Identity()
                      )
                      (lora_A): ModuleDict(
                        (default): Linear(in_features=320, out_features=8, bias=False)
                      )
                      (lora_B): ModuleDict(
                        (default): Linear(in_features=8, out_features=320, bias=False)
                      )
                      (lora_embedding_A): ParameterDict()
                      (lora_embedding_B): ParameterDict()
                    )
                    (to_k): Linear(in_features=768, out_features=320, bias=False)
                    (to_v): lora.Linear(
                      (base_layer): Linear(in_features=768, out_features=320, bias=False)
                      (lora_dropout): ModuleDict(
                        (default): Identity()
                      )
                      (lora_A): ModuleDict(
                        (default): Linear(in_features=768, out_features=8, bias=False)
                      )
                      (lora_B): ModuleDict(
                        (default): Linear(in_features=8, out_features=320, bias=False)
                      )
                      (lora_embedding_A): ParameterDict()
                      (lora_embedding_B): ParameterDict()
                    )
                    (to_out): ModuleList(
                      (0): Linear(in_features=320, out_features=320, bias=True)
                      (1): Dropout(p=0.0, inplace=False)
                    )
                  )
                  (norm3): LayerNorm((320,), eps=1e-05, elementwise_affine=True)
                  (ff): FeedForward(
                    (net): ModuleList(
                      (0): GEGLU(
                        (proj): Linear(in_features=320, out_features=2560, bias=True)
                      )
                      (1): Dropout(p=0.0, inplace=False)
                      (2): Linear(in_features=1280, out_features=320, bias=True)
                    )
                  )
                )
              )
              (proj_out): Conv2d(320, 320, kernel_size=(1, 1), stride=(1, 1))
            )
          )
          (resnets): ModuleList(
            (0-1): 2 x ResnetBlock2D(
              (norm1): GroupNorm(32, 320, eps=1e-05, affine=True)
              (conv1): Conv2d(320, 320, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
              (time_emb_proj): Linear(in_features=1280, out_features=320, bias=True)
              (norm2): GroupNorm(32, 320, eps=1e-05, affine=True)
              (dropout): Dropout(p=0.0, inplace=False)
              (conv2): Conv2d(320, 320, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
              (nonlinearity): SiLU()
            )
          )
          (downsamplers): ModuleList(
            (0): Downsample2D(
              (conv): Conv2d(320, 320, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1))
            )
          )
        )
        (1): CrossAttnDownBlock2D(
          (attentions): ModuleList(
            (0-1): 2 x Transformer2DModel(
              (norm): GroupNorm(32, 640, eps=1e-06, affine=True)
              (proj_in): Conv2d(640, 640, kernel_size=(1, 1), stride=(1, 1))
              (transformer_blocks): ModuleList(
                (0): BasicTransformerBlock(
                  (norm1): LayerNorm((640,), eps=1e-05, elementwise_affine=True)
                  (attn1): Attention(
                    (to_q): lora.Linear(
                      (base_layer): Linear(in_features=640, out_features=640, bias=False)
                      (lora_dropout): ModuleDict(
                        (default): Identity()
                      )
                      (lora_A): ModuleDict(
                        (default): Linear(in_features=640, out_features=8, bias=False)
                      )
                      (lora_B): ModuleDict(
                        (default): Linear(in_features=8, out_features=640, bias=False)
                      )
                      (lora_embedding_A): ParameterDict()
                      (lora_embedding_B): ParameterDict()
                    )
                    (to_k): Linear(in_features=640, out_features=640, bias=False)
                    (to_v): lora.Linear(
                      (base_layer): Linear(in_features=640, out_features=640, bias=False)
                      (lora_dropout): ModuleDict(
                        (default): Identity()
                      )
                      (lora_A): ModuleDict(
                        (default): Linear(in_features=640, out_features=8, bias=False)
                      )
                      (lora_B): ModuleDict(
                        (default): Linear(in_features=8, out_features=640, bias=False)
                      )
                      (lora_embedding_A): ParameterDict()
                      (lora_embedding_B): ParameterDict()
                    )
                    (to_out): ModuleList(
                      (0): Linear(in_features=640, out_features=640, bias=True)
                      (1): Dropout(p=0.0, inplace=False)
                    )
                  )
                  (norm2): LayerNorm((640,), eps=1e-05, elementwise_affine=True)
                  (attn2): Attention(
                    (to_q): lora.Linear(
                      (base_layer): Linear(in_features=640, out_features=640, bias=False)
                      (lora_dropout): ModuleDict(
                        (default): Identity()
                      )
                      (lora_A): ModuleDict(
                        (default): Linear(in_features=640, out_features=8, bias=False)
                      )
                      (lora_B): ModuleDict(
                        (default): Linear(in_features=8, out_features=640, bias=False)
                      )
                      (lora_embedding_A): ParameterDict()
                      (lora_embedding_B): ParameterDict()
                    )
                    (to_k): Linear(in_features=768, out_features=640, bias=False)
                    (to_v): lora.Linear(
                      (base_layer): Linear(in_features=768, out_features=640, bias=False)
                      (lora_dropout): ModuleDict(
                        (default): Identity()
                      )
                      (lora_A): ModuleDict(
                        (default): Linear(in_features=768, out_features=8, bias=False)
                      )
                      (lora_B): ModuleDict(
                        (default): Linear(in_features=8, out_features=640, bias=False)
                      )
                      (lora_embedding_A): ParameterDict()
                      (lora_embedding_B): ParameterDict()
                    )
                    (to_out): ModuleList(
                      (0): Linear(in_features=640, out_features=640, bias=True)
                      (1): Dropout(p=0.0, inplace=False)
                    )
                  )
                  (norm3): LayerNorm((640,), eps=1e-05, elementwise_affine=True)
                  (ff): FeedForward(
                    (net): ModuleList(
                      (0): GEGLU(
                        (proj): Linear(in_features=640, out_features=5120, bias=True)
                      )
                      (1): Dropout(p=0.0, inplace=False)
                      (2): Linear(in_features=2560, out_features=640, bias=True)
                    )
                  )
                )
              )
              (proj_out): Conv2d(640, 640, kernel_size=(1, 1), stride=(1, 1))
            )
          )
          (resnets): ModuleList(
            (0): ResnetBlock2D(
              (norm1): GroupNorm(32, 320, eps=1e-05, affine=True)
              (conv1): Conv2d(320, 640, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
              (time_emb_proj): Linear(in_features=1280, out_features=640, bias=True)
              (norm2): GroupNorm(32, 640, eps=1e-05, affine=True)
              (dropout): Dropout(p=0.0, inplace=False)
              (conv2): Conv2d(640, 640, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
              (nonlinearity): SiLU()
              (conv_shortcut): Conv2d(320, 640, kernel_size=(1, 1), stride=(1, 1))
            )
            (1): ResnetBlock2D(
              (norm1): GroupNorm(32, 640, eps=1e-05, affine=True)
              (conv1): Conv2d(640, 640, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
              (time_emb_proj): Linear(in_features=1280, out_features=640, bias=True)
              (norm2): GroupNorm(32, 640, eps=1e-05, affine=True)
              (dropout): Dropout(p=0.0, inplace=False)
              (conv2): Conv2d(640, 640, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
              (nonlinearity): SiLU()
            )
          )
          (downsamplers): ModuleList(
            (0): Downsample2D(
              (conv): Conv2d(640, 640, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1))
            )
          )
        )
        (2): CrossAttnDownBlock2D(
          (attentions): ModuleList(
            (0-1): 2 x Transformer2DModel(
              (norm): GroupNorm(32, 1280, eps=1e-06, affine=True)
              (proj_in): Conv2d(1280, 1280, kernel_size=(1, 1), stride=(1, 1))
              (transformer_blocks): ModuleList(
                (0): BasicTransformerBlock(
                  (norm1): LayerNorm((1280,), eps=1e-05, elementwise_affine=True)
                  (attn1): Attention(
                    (to_q): lora.Linear(
                      (base_layer): Linear(in_features=1280, out_features=1280, bias=False)
                      (lora_dropout): ModuleDict(
                        (default): Identity()
                      )
                      (lora_A): ModuleDict(
                        (default): Linear(in_features=1280, out_features=8, bias=False)
                      )
                      (lora_B): ModuleDict(
                        (default): Linear(in_features=8, out_features=1280, bias=False)
                      )
                      (lora_embedding_A): ParameterDict()
                      (lora_embedding_B): ParameterDict()
                    )
                    (to_k): Linear(in_features=1280, out_features=1280, bias=False)
                    (to_v): lora.Linear(
                      (base_layer): Linear(in_features=1280, out_features=1280, bias=False)
                      (lora_dropout): ModuleDict(
                        (default): Identity()
                      )
                      (lora_A): ModuleDict(
                        (default): Linear(in_features=1280, out_features=8, bias=False)
                      )
                      (lora_B): ModuleDict(
                        (default): Linear(in_features=8, out_features=1280, bias=False)
                      )
                      (lora_embedding_A): ParameterDict()
                      (lora_embedding_B): ParameterDict()
                    )
                    (to_out): ModuleList(
                      (0): Linear(in_features=1280, out_features=1280, bias=True)
                      (1): Dropout(p=0.0, inplace=False)
                    )
                  )
                  (norm2): LayerNorm((1280,), eps=1e-05, elementwise_affine=True)
                  (attn2): Attention(
                    (to_q): lora.Linear(
                      (base_layer): Linear(in_features=1280, out_features=1280, bias=False)
                      (lora_dropout): ModuleDict(
                        (default): Identity()
                      )
                      (lora_A): ModuleDict(
                        (default): Linear(in_features=1280, out_features=8, bias=False)
                      )
                      (lora_B): ModuleDict(
                        (default): Linear(in_features=8, out_features=1280, bias=False)
                      )
                      (lora_embedding_A): ParameterDict()
                      (lora_embedding_B): ParameterDict()
                    )
                    (to_k): Linear(in_features=768, out_features=1280, bias=False)
                    (to_v): lora.Linear(
                      (base_layer): Linear(in_features=768, out_features=1280, bias=False)
                      (lora_dropout): ModuleDict(
                        (default): Identity()
                      )
                      (lora_A): ModuleDict(
                        (default): Linear(in_features=768, out_features=8, bias=False)
                      )
                      (lora_B): ModuleDict(
                        (default): Linear(in_features=8, out_features=1280, bias=False)
                      )
                      (lora_embedding_A): ParameterDict()
                      (lora_embedding_B): ParameterDict()
                    )
                    (to_out): ModuleList(
                      (0): Linear(in_features=1280, out_features=1280, bias=True)
                      (1): Dropout(p=0.0, inplace=False)
                    )
                  )
                  (norm3): LayerNorm((1280,), eps=1e-05, elementwise_affine=True)
                  (ff): FeedForward(
                    (net): ModuleList(
                      (0): GEGLU(
                        (proj): Linear(in_features=1280, out_features=10240, bias=True)
                      )
                      (1): Dropout(p=0.0, inplace=False)
                      (2): Linear(in_features=5120, out_features=1280, bias=True)
                    )
                  )
                )
              )
              (proj_out): Conv2d(1280, 1280, kernel_size=(1, 1), stride=(1, 1))
            )
          )
          (resnets): ModuleList(
            (0): ResnetBlock2D(
              (norm1): GroupNorm(32, 640, eps=1e-05, affine=True)
              (conv1): Conv2d(640, 1280, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
              (time_emb_proj): Linear(in_features=1280, out_features=1280, bias=True)
              (norm2): GroupNorm(32, 1280, eps=1e-05, affine=True)
              (dropout): Dropout(p=0.0, inplace=False)
              (conv2): Conv2d(1280, 1280, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
              (nonlinearity): SiLU()
              (conv_shortcut): Conv2d(640, 1280, kernel_size=(1, 1), stride=(1, 1))
            )
            (1): ResnetBlock2D(
              (norm1): GroupNorm(32, 1280, eps=1e-05, affine=True)
              (conv1): Conv2d(1280, 1280, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
              (time_emb_proj): Linear(in_features=1280, out_features=1280, bias=True)
              (norm2): GroupNorm(32, 1280, eps=1e-05, affine=True)
              (dropout): Dropout(p=0.0, inplace=False)
              (conv2): Conv2d(1280, 1280, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
              (nonlinearity): SiLU()
            )
          )
          (downsamplers): ModuleList(
            (0): Downsample2D(
              (conv): Conv2d(1280, 1280, kernel_size=(3, 3), stride=(2, 2), padding=(1, 1))
            )
          )
        )
        (3): DownBlock2D(
          (resnets): ModuleList(
            (0-1): 2 x ResnetBlock2D(
              (norm1): GroupNorm(32, 1280, eps=1e-05, affine=True)
              (conv1): Conv2d(1280, 1280, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
              (time_emb_proj): Linear(in_features=1280, out_features=1280, bias=True)
              (norm2): GroupNorm(32, 1280, eps=1e-05, affine=True)
              (dropout): Dropout(p=0.0, inplace=False)
              (conv2): Conv2d(1280, 1280, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
              (nonlinearity): SiLU()
            )
          )
        )
      )
      (up_blocks): ModuleList(
        (0): UpBlock2D(
          (resnets): ModuleList(
            (0-2): 3 x ResnetBlock2D(
              (norm1): GroupNorm(32, 2560, eps=1e-05, affine=True)
              (conv1): Conv2d(2560, 1280, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
              (time_emb_proj): Linear(in_features=1280, out_features=1280, bias=True)
              (norm2): GroupNorm(32, 1280, eps=1e-05, affine=True)
              (dropout): Dropout(p=0.0, inplace=False)
              (conv2): Conv2d(1280, 1280, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
              (nonlinearity): SiLU()
              (conv_shortcut): Conv2d(2560, 1280, kernel_size=(1, 1), stride=(1, 1))
            )
          )
          (upsamplers): ModuleList(
            (0): Upsample2D(
              (conv): Conv2d(1280, 1280, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
            )
          )
        )
        (1): CrossAttnUpBlock2D(
          (attentions): ModuleList(
            (0-2): 3 x Transformer2DModel(
              (norm): GroupNorm(32, 1280, eps=1e-06, affine=True)
              (proj_in): Conv2d(1280, 1280, kernel_size=(1, 1), stride=(1, 1))
              (transformer_blocks): ModuleList(
                (0): BasicTransformerBlock(
                  (norm1): LayerNorm((1280,), eps=1e-05, elementwise_affine=True)
                  (attn1): Attention(
                    (to_q): lora.Linear(
                      (base_layer): Linear(in_features=1280, out_features=1280, bias=False)
                      (lora_dropout): ModuleDict(
                        (default): Identity()
                      )
                      (lora_A): ModuleDict(
                        (default): Linear(in_features=1280, out_features=8, bias=False)
                      )
                      (lora_B): ModuleDict(
                        (default): Linear(in_features=8, out_features=1280, bias=False)
                      )
                      (lora_embedding_A): ParameterDict()
                      (lora_embedding_B): ParameterDict()
                    )
                    (to_k): Linear(in_features=1280, out_features=1280, bias=False)
                    (to_v): lora.Linear(
                      (base_layer): Linear(in_features=1280, out_features=1280, bias=False)
                      (lora_dropout): ModuleDict(
                        (default): Identity()
                      )
                      (lora_A): ModuleDict(
                        (default): Linear(in_features=1280, out_features=8, bias=False)
                      )
                      (lora_B): ModuleDict(
                        (default): Linear(in_features=8, out_features=1280, bias=False)
                      )
                      (lora_embedding_A): ParameterDict()
                      (lora_embedding_B): ParameterDict()
                    )
                    (to_out): ModuleList(
                      (0): Linear(in_features=1280, out_features=1280, bias=True)
                      (1): Dropout(p=0.0, inplace=False)
                    )
                  )
                  (norm2): LayerNorm((1280,), eps=1e-05, elementwise_affine=True)
                  (attn2): Attention(
                    (to_q): lora.Linear(
                      (base_layer): Linear(in_features=1280, out_features=1280, bias=False)
                      (lora_dropout): ModuleDict(
                        (default): Identity()
                      )
                      (lora_A): ModuleDict(
                        (default): Linear(in_features=1280, out_features=8, bias=False)
                      )
                      (lora_B): ModuleDict(
                        (default): Linear(in_features=8, out_features=1280, bias=False)
                      )
                      (lora_embedding_A): ParameterDict()
                      (lora_embedding_B): ParameterDict()
                    )
                    (to_k): Linear(in_features=768, out_features=1280, bias=False)
                    (to_v): lora.Linear(
                      (base_layer): Linear(in_features=768, out_features=1280, bias=False)
                      (lora_dropout): ModuleDict(
                        (default): Identity()
                      )
                      (lora_A): ModuleDict(
                        (default): Linear(in_features=768, out_features=8, bias=False)
                      )
                      (lora_B): ModuleDict(
                        (default): Linear(in_features=8, out_features=1280, bias=False)
                      )
                      (lora_embedding_A): ParameterDict()
                      (lora_embedding_B): ParameterDict()
                    )
                    (to_out): ModuleList(
                      (0): Linear(in_features=1280, out_features=1280, bias=True)
                      (1): Dropout(p=0.0, inplace=False)
                    )
                  )
                  (norm3): LayerNorm((1280,), eps=1e-05, elementwise_affine=True)
                  (ff): FeedForward(
                    (net): ModuleList(
                      (0): GEGLU(
                        (proj): Linear(in_features=1280, out_features=10240, bias=True)
                      )
                      (1): Dropout(p=0.0, inplace=False)
                      (2): Linear(in_features=5120, out_features=1280, bias=True)
                    )
                  )
                )
              )
              (proj_out): Conv2d(1280, 1280, kernel_size=(1, 1), stride=(1, 1))
            )
          )
          (resnets): ModuleList(
            (0-1): 2 x ResnetBlock2D(
              (norm1): GroupNorm(32, 2560, eps=1e-05, affine=True)
              (conv1): Conv2d(2560, 1280, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
              (time_emb_proj): Linear(in_features=1280, out_features=1280, bias=True)
              (norm2): GroupNorm(32, 1280, eps=1e-05, affine=True)
              (dropout): Dropout(p=0.0, inplace=False)
              (conv2): Conv2d(1280, 1280, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
              (nonlinearity): SiLU()
              (conv_shortcut): Conv2d(2560, 1280, kernel_size=(1, 1), stride=(1, 1))
            )
            (2): ResnetBlock2D(
              (norm1): GroupNorm(32, 1920, eps=1e-05, affine=True)
              (conv1): Conv2d(1920, 1280, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
              (time_emb_proj): Linear(in_features=1280, out_features=1280, bias=True)
              (norm2): GroupNorm(32, 1280, eps=1e-05, affine=True)
              (dropout): Dropout(p=0.0, inplace=False)
              (conv2): Conv2d(1280, 1280, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
              (nonlinearity): SiLU()
              (conv_shortcut): Conv2d(1920, 1280, kernel_size=(1, 1), stride=(1, 1))
            )
          )
          (upsamplers): ModuleList(
            (0): Upsample2D(
              (conv): Conv2d(1280, 1280, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
            )
          )
        )
        (2): CrossAttnUpBlock2D(
          (attentions): ModuleList(
            (0-2): 3 x Transformer2DModel(
              (norm): GroupNorm(32, 640, eps=1e-06, affine=True)
              (proj_in): Conv2d(640, 640, kernel_size=(1, 1), stride=(1, 1))
              (transformer_blocks): ModuleList(
                (0): BasicTransformerBlock(
                  (norm1): LayerNorm((640,), eps=1e-05, elementwise_affine=True)
                  (attn1): Attention(
                    (to_q): lora.Linear(
                      (base_layer): Linear(in_features=640, out_features=640, bias=False)
                      (lora_dropout): ModuleDict(
                        (default): Identity()
                      )
                      (lora_A): ModuleDict(
                        (default): Linear(in_features=640, out_features=8, bias=False)
                      )
                      (lora_B): ModuleDict(
                        (default): Linear(in_features=8, out_features=640, bias=False)
                      )
                      (lora_embedding_A): ParameterDict()
                      (lora_embedding_B): ParameterDict()
                    )
                    (to_k): Linear(in_features=640, out_features=640, bias=False)
                    (to_v): lora.Linear(
                      (base_layer): Linear(in_features=640, out_features=640, bias=False)
                      (lora_dropout): ModuleDict(
                        (default): Identity()
                      )
                      (lora_A): ModuleDict(
                        (default): Linear(in_features=640, out_features=8, bias=False)
                      )
                      (lora_B): ModuleDict(
                        (default): Linear(in_features=8, out_features=640, bias=False)
                      )
                      (lora_embedding_A): ParameterDict()
                      (lora_embedding_B): ParameterDict()
                    )
                    (to_out): ModuleList(
                      (0): Linear(in_features=640, out_features=640, bias=True)
                      (1): Dropout(p=0.0, inplace=False)
                    )
                  )
                  (norm2): LayerNorm((640,), eps=1e-05, elementwise_affine=True)
                  (attn2): Attention(
                    (to_q): lora.Linear(
                      (base_layer): Linear(in_features=640, out_features=640, bias=False)
                      (lora_dropout): ModuleDict(
                        (default): Identity()
                      )
                      (lora_A): ModuleDict(
                        (default): Linear(in_features=640, out_features=8, bias=False)
                      )
                      (lora_B): ModuleDict(
                        (default): Linear(in_features=8, out_features=640, bias=False)
                      )
                      (lora_embedding_A): ParameterDict()
                      (lora_embedding_B): ParameterDict()
                    )
                    (to_k): Linear(in_features=768, out_features=640, bias=False)
                    (to_v): lora.Linear(
                      (base_layer): Linear(in_features=768, out_features=640, bias=False)
                      (lora_dropout): ModuleDict(
                        (default): Identity()
                      )
                      (lora_A): ModuleDict(
                        (default): Linear(in_features=768, out_features=8, bias=False)
                      )
                      (lora_B): ModuleDict(
                        (default): Linear(in_features=8, out_features=640, bias=False)
                      )
                      (lora_embedding_A): ParameterDict()
                      (lora_embedding_B): ParameterDict()
                    )
                    (to_out): ModuleList(
                      (0): Linear(in_features=640, out_features=640, bias=True)
                      (1): Dropout(p=0.0, inplace=False)
                    )
                  )
                  (norm3): LayerNorm((640,), eps=1e-05, elementwise_affine=True)
                  (ff): FeedForward(
                    (net): ModuleList(
                      (0): GEGLU(
                        (proj): Linear(in_features=640, out_features=5120, bias=True)
                      )
                      (1): Dropout(p=0.0, inplace=False)
                      (2): Linear(in_features=2560, out_features=640, bias=True)
                    )
                  )
                )
              )
              (proj_out): Conv2d(640, 640, kernel_size=(1, 1), stride=(1, 1))
            )
          )
          (resnets): ModuleList(
            (0): ResnetBlock2D(
              (norm1): GroupNorm(32, 1920, eps=1e-05, affine=True)
              (conv1): Conv2d(1920, 640, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
              (time_emb_proj): Linear(in_features=1280, out_features=640, bias=True)
              (norm2): GroupNorm(32, 640, eps=1e-05, affine=True)
              (dropout): Dropout(p=0.0, inplace=False)
              (conv2): Conv2d(640, 640, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
              (nonlinearity): SiLU()
              (conv_shortcut): Conv2d(1920, 640, kernel_size=(1, 1), stride=(1, 1))
            )
            (1): ResnetBlock2D(
              (norm1): GroupNorm(32, 1280, eps=1e-05, affine=True)
              (conv1): Conv2d(1280, 640, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
              (time_emb_proj): Linear(in_features=1280, out_features=640, bias=True)
              (norm2): GroupNorm(32, 640, eps=1e-05, affine=True)
              (dropout): Dropout(p=0.0, inplace=False)
              (conv2): Conv2d(640, 640, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
              (nonlinearity): SiLU()
              (conv_shortcut): Conv2d(1280, 640, kernel_size=(1, 1), stride=(1, 1))
            )
            (2): ResnetBlock2D(
              (norm1): GroupNorm(32, 960, eps=1e-05, affine=True)
              (conv1): Conv2d(960, 640, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
              (time_emb_proj): Linear(in_features=1280, out_features=640, bias=True)
              (norm2): GroupNorm(32, 640, eps=1e-05, affine=True)
              (dropout): Dropout(p=0.0, inplace=False)
              (conv2): Conv2d(640, 640, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
              (nonlinearity): SiLU()
              (conv_shortcut): Conv2d(960, 640, kernel_size=(1, 1), stride=(1, 1))
            )
          )
          (upsamplers): ModuleList(
            (0): Upsample2D(
              (conv): Conv2d(640, 640, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
            )
          )
        )
        (3): CrossAttnUpBlock2D(
          (attentions): ModuleList(
            (0-2): 3 x Transformer2DModel(
              (norm): GroupNorm(32, 320, eps=1e-06, affine=True)
              (proj_in): Conv2d(320, 320, kernel_size=(1, 1), stride=(1, 1))
              (transformer_blocks): ModuleList(
                (0): BasicTransformerBlock(
                  (norm1): LayerNorm((320,), eps=1e-05, elementwise_affine=True)
                  (attn1): Attention(
                    (to_q): lora.Linear(
                      (base_layer): Linear(in_features=320, out_features=320, bias=False)
                      (lora_dropout): ModuleDict(
                        (default): Identity()
                      )
                      (lora_A): ModuleDict(
                        (default): Linear(in_features=320, out_features=8, bias=False)
                      )
                      (lora_B): ModuleDict(
                        (default): Linear(in_features=8, out_features=320, bias=False)
                      )
                      (lora_embedding_A): ParameterDict()
                      (lora_embedding_B): ParameterDict()
                    )
                    (to_k): Linear(in_features=320, out_features=320, bias=False)
                    (to_v): lora.Linear(
                      (base_layer): Linear(in_features=320, out_features=320, bias=False)
                      (lora_dropout): ModuleDict(
                        (default): Identity()
                      )
                      (lora_A): ModuleDict(
                        (default): Linear(in_features=320, out_features=8, bias=False)
                      )
                      (lora_B): ModuleDict(
                        (default): Linear(in_features=8, out_features=320, bias=False)
                      )
                      (lora_embedding_A): ParameterDict()
                      (lora_embedding_B): ParameterDict()
                    )
                    (to_out): ModuleList(
                      (0): Linear(in_features=320, out_features=320, bias=True)
                      (1): Dropout(p=0.0, inplace=False)
                    )
                  )
                  (norm2): LayerNorm((320,), eps=1e-05, elementwise_affine=True)
                  (attn2): Attention(
                    (to_q): lora.Linear(
                      (base_layer): Linear(in_features=320, out_features=320, bias=False)
                      (lora_dropout): ModuleDict(
                        (default): Identity()
                      )
                      (lora_A): ModuleDict(
                        (default): Linear(in_features=320, out_features=8, bias=False)
                      )
                      (lora_B): ModuleDict(
                        (default): Linear(in_features=8, out_features=320, bias=False)
                      )
                      (lora_embedding_A): ParameterDict()
                      (lora_embedding_B): ParameterDict()
                    )
                    (to_k): Linear(in_features=768, out_features=320, bias=False)
                    (to_v): lora.Linear(
                      (base_layer): Linear(in_features=768, out_features=320, bias=False)
                      (lora_dropout): ModuleDict(
                        (default): Identity()
                      )
                      (lora_A): ModuleDict(
                        (default): Linear(in_features=768, out_features=8, bias=False)
                      )
                      (lora_B): ModuleDict(
                        (default): Linear(in_features=8, out_features=320, bias=False)
                      )
                      (lora_embedding_A): ParameterDict()
                      (lora_embedding_B): ParameterDict()
                    )
                    (to_out): ModuleList(
                      (0): Linear(in_features=320, out_features=320, bias=True)
                      (1): Dropout(p=0.0, inplace=False)
                    )
                  )
                  (norm3): LayerNorm((320,), eps=1e-05, elementwise_affine=True)
                  (ff): FeedForward(
                    (net): ModuleList(
                      (0): GEGLU(
                        (proj): Linear(in_features=320, out_features=2560, bias=True)
                      )
                      (1): Dropout(p=0.0, inplace=False)
                      (2): Linear(in_features=1280, out_features=320, bias=True)
                    )
                  )
                )
              )
              (proj_out): Conv2d(320, 320, kernel_size=(1, 1), stride=(1, 1))
            )
          )
          (resnets): ModuleList(
            (0): ResnetBlock2D(
              (norm1): GroupNorm(32, 960, eps=1e-05, affine=True)
              (conv1): Conv2d(960, 320, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
              (time_emb_proj): Linear(in_features=1280, out_features=320, bias=True)
              (norm2): GroupNorm(32, 320, eps=1e-05, affine=True)
              (dropout): Dropout(p=0.0, inplace=False)
              (conv2): Conv2d(320, 320, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
              (nonlinearity): SiLU()
              (conv_shortcut): Conv2d(960, 320, kernel_size=(1, 1), stride=(1, 1))
            )
            (1-2): 2 x ResnetBlock2D(
              (norm1): GroupNorm(32, 640, eps=1e-05, affine=True)
              (conv1): Conv2d(640, 320, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
              (time_emb_proj): Linear(in_features=1280, out_features=320, bias=True)
              (norm2): GroupNorm(32, 320, eps=1e-05, affine=True)
              (dropout): Dropout(p=0.0, inplace=False)
              (conv2): Conv2d(320, 320, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
              (nonlinearity): SiLU()
              (conv_shortcut): Conv2d(640, 320, kernel_size=(1, 1), stride=(1, 1))
            )
          )
        )
      )
      (mid_block): UNetMidBlock2DCrossAttn(
        (attentions): ModuleList(
          (0): Transformer2DModel(
            (norm): GroupNorm(32, 1280, eps=1e-06, affine=True)
            (proj_in): Conv2d(1280, 1280, kernel_size=(1, 1), stride=(1, 1))
            (transformer_blocks): ModuleList(
              (0): BasicTransformerBlock(
                (norm1): LayerNorm((1280,), eps=1e-05, elementwise_affine=True)
                (attn1): Attention(
                  (to_q): lora.Linear(
                    (base_layer): Linear(in_features=1280, out_features=1280, bias=False)
                    (lora_dropout): ModuleDict(
                      (default): Identity()
                    )
                    (lora_A): ModuleDict(
                      (default): Linear(in_features=1280, out_features=8, bias=False)
                    )
                    (lora_B): ModuleDict(
                      (default): Linear(in_features=8, out_features=1280, bias=False)
                    )
                    (lora_embedding_A): ParameterDict()
                    (lora_embedding_B): ParameterDict()
                  )
                  (to_k): Linear(in_features=1280, out_features=1280, bias=False)
                  (to_v): lora.Linear(
                    (base_layer): Linear(in_features=1280, out_features=1280, bias=False)
                    (lora_dropout): ModuleDict(
                      (default): Identity()
                    )
                    (lora_A): ModuleDict(
                      (default): Linear(in_features=1280, out_features=8, bias=False)
                    )
                    (lora_B): ModuleDict(
                      (default): Linear(in_features=8, out_features=1280, bias=False)
                    )
                    (lora_embedding_A): ParameterDict()
                    (lora_embedding_B): ParameterDict()
                  )
                  (to_out): ModuleList(
                    (0): Linear(in_features=1280, out_features=1280, bias=True)
                    (1): Dropout(p=0.0, inplace=False)
                  )
                )
                (norm2): LayerNorm((1280,), eps=1e-05, elementwise_affine=True)
                (attn2): Attention(
                  (to_q): lora.Linear(
                    (base_layer): Linear(in_features=1280, out_features=1280, bias=False)
                    (lora_dropout): ModuleDict(
                      (default): Identity()
                    )
                    (lora_A): ModuleDict(
                      (default): Linear(in_features=1280, out_features=8, bias=False)
                    )
                    (lora_B): ModuleDict(
                      (default): Linear(in_features=8, out_features=1280, bias=False)
                    )
                    (lora_embedding_A): ParameterDict()
                    (lora_embedding_B): ParameterDict()
                  )
                  (to_k): Linear(in_features=768, out_features=1280, bias=False)
                  (to_v): lora.Linear(
                    (base_layer): Linear(in_features=768, out_features=1280, bias=False)
                    (lora_dropout): ModuleDict(
                      (default): Identity()
                    )
                    (lora_A): ModuleDict(
                      (default): Linear(in_features=768, out_features=8, bias=False)
                    )
                    (lora_B): ModuleDict(
                      (default): Linear(in_features=8, out_features=1280, bias=False)
                    )
                    (lora_embedding_A): ParameterDict()
                    (lora_embedding_B): ParameterDict()
                  )
                  (to_out): ModuleList(
                    (0): Linear(in_features=1280, out_features=1280, bias=True)
                    (1): Dropout(p=0.0, inplace=False)
                  )
                )
                (norm3): LayerNorm((1280,), eps=1e-05, elementwise_affine=True)
                (ff): FeedForward(
                  (net): ModuleList(
                    (0): GEGLU(
                      (proj): Linear(in_features=1280, out_features=10240, bias=True)
                    )
                    (1): Dropout(p=0.0, inplace=False)
                    (2): Linear(in_features=5120, out_features=1280, bias=True)
                  )
                )
              )
            )
            (proj_out): Conv2d(1280, 1280, kernel_size=(1, 1), stride=(1, 1))
          )
        )
        (resnets): ModuleList(
          (0-1): 2 x ResnetBlock2D(
            (norm1): GroupNorm(32, 1280, eps=1e-05, affine=True)
            (conv1): Conv2d(1280, 1280, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
            (time_emb_proj): Linear(in_features=1280, out_features=1280, bias=True)
            (norm2): GroupNorm(32, 1280, eps=1e-05, affine=True)
            (dropout): Dropout(p=0.0, inplace=False)
            (conv2): Conv2d(1280, 1280, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
            (nonlinearity): SiLU()
          )
        )
      )
      (conv_norm_out): GroupNorm(32, 320, eps=1e-05, affine=True)
      (conv_act): SiLU()
      (conv_out): Conv2d(320, 4, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
    )
  )
)
03/10/2024 19:47:07 - INFO - __main__ - ***** Running training *****
03/10/2024 19:47:07 - INFO - __main__ -   Num examples = 5
03/10/2024 19:47:07 - INFO - __main__ -   Num batches each epoch = 5
03/10/2024 19:47:07 - INFO - __main__ -   Num Epochs = 100
03/10/2024 19:47:07 - INFO - __main__ -   Instantaneous batch size per device = 1
03/10/2024 19:47:07 - INFO - __main__ -   Total train batch size (w. parallel, distributed & accumulation) = 1
03/10/2024 19:47:07 - INFO - __main__ -   Gradient Accumulation steps = 1
03/10/2024 19:47:07 - INFO - __main__ -   Total optimization steps = 500
Steps:   0%|                                                                                                                                                                            | 0/500 [00:00<?, ?it/s]D:\ProgramData\Miniconda3\envs\zero123-hf-new\lib\site-packages\diffusers\models\attention_processor.py:1129: UserWarning: 1Torch was not compiled with flash attention. (Triggered internally at C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\transformers\cuda\sdp_utils.cpp:263.)
  hidden_states = F.scaled_dot_product_attention(
Steps:   0%|▎                                                                                                                                            | 1/500 [00:04<39:25,  4.74s/it, loss=0.021, lr=0.0001]03/10/2024 19:47:12 - INFO - __main__ - Running validation...
 Generating 4 images with prompt: A photo of sks dog in the snow.
text_encoder\model.safetensors not found
{'requires_safety_checker'} was not found in config. Values will be initialized to default values.
{'class_embed_type', 'resnet_time_scale_shift', 'transformer_layers_per_block', 'conv_out_kernel', 'addition_embed_type', 'class_embeddings_concat', 'num_class_embeds', 'upcast_attention', 'encoder_hid_dim', 'addition_time_embed_dim', 'mid_block_type', 'resnet_out_scale_factor', 'resnet_skip_time_act', 'use_linear_projection', 'dual_cross_attention', 'only_cross_attention', 'time_embedding_dim', 'time_embedding_type', 'addition_embed_type_num_heads', 'cross_attention_norm', 'encoder_hid_dim_type', 'conv_in_kernel', 'projection_class_embeddings_input_dim', 'time_embedding_act_fn', 'mid_block_only_cross_attention', 'timestep_post_act', 'num_attention_heads', 'time_cond_proj_dim'} was not found in config. Values will be initialized to default values.
{'timestep_spacing', 'prediction_type'} was not found in config. Values will be initialized to default values.
{'scaling_factor'} was not found in config. Values will be initialized to default values.
You have disabled the safety checker for <class 'diffusers.pipelines.stable_diffusion.pipeline_stable_diffusion.StableDiffusionPipeline'> by passing `safety_checker=None`. Ensure that you abide to the conditions of the Stable Diffusion license and do not expose unfiltered results in services or applications open to the public. Both the diffusers team and Hugging Face strongly recommend to keep the safety filter enabled in all public facing circumstances, disabling it only for use-cases that involve analyzing network behavior or auditing its results. For more information, please have a look at https://github.com/huggingface/diffusers/pull/254 .
{'variance_type', 'solver_order', 'solver_type', 'dynamic_thresholding_ratio', 'timestep_spacing', 'algorithm_type', 'lambda_min_clipped', 'thresholding', 'sample_max_value', 'prediction_type', 'use_karras_sigmas', 'lower_order_final'} was not found in config. Values will be initialized to default values.
Steps:   4%|█████▉                                                                                                                                       | 21/500 [00:22<02:53,  2.75it/s, loss=0.22, lr=0.0001]03/10/2024 19:47:30 - INFO - __main__ - Running validation...
 Generating 4 images with prompt: A photo of sks dog in the snow.
text_encoder\model.safetensors not found
{'requires_safety_checker'} was not found in config. Values will be initialized to default values.
{'class_embed_type', 'resnet_time_scale_shift', 'transformer_layers_per_block', 'conv_out_kernel', 'addition_embed_type', 'class_embeddings_concat', 'num_class_embeds', 'upcast_attention', 'encoder_hid_dim', 'addition_time_embed_dim', 'mid_block_type', 'resnet_out_scale_factor', 'resnet_skip_time_act', 'use_linear_projection', 'dual_cross_attention', 'only_cross_attention', 'time_embedding_dim', 'time_embedding_type', 'addition_embed_type_num_heads', 'cross_attention_norm', 'encoder_hid_dim_type', 'conv_in_kernel', 'projection_class_embeddings_input_dim', 'time_embedding_act_fn', 'mid_block_only_cross_attention', 'timestep_post_act', 'num_attention_heads', 'time_cond_proj_dim'} was not found in config. Values will be initialized to default values.
{'timestep_spacing', 'prediction_type'} was not found in config. Values will be initialized to default values.
{'scaling_factor'} was not found in config. Values will be initialized to default values.
You have disabled the safety checker for <class 'diffusers.pipelines.stable_diffusion.pipeline_stable_diffusion.StableDiffusionPipeline'> by passing `safety_checker=None`. Ensure that you abide to the conditions of the Stable Diffusion license and do not expose unfiltered results in services or applications open to the public. Both the diffusers team and Hugging Face strongly recommend to keep the safety filter enabled in all public facing circumstances, disabling it only for use-cases that involve analyzing network behavior or auditing its results. For more information, please have a look at https://github.com/huggingface/diffusers/pull/254 .
{'variance_type', 'solver_order', 'solver_type', 'dynamic_thresholding_ratio', 'timestep_spacing', 'algorithm_type', 'lambda_min_clipped', 'thresholding', 'sample_max_value', 'prediction_type', 'use_karras_sigmas', 'lower_order_final'} was not found in config. Values will be initialized to default values.
Steps:   8%|███████████▍                                                                                                                               | 41/500 [00:40<02:33,  2.99it/s, loss=0.0685, lr=0.0001]03/10/2024 19:47:48 - INFO - __main__ - Running validation...
 Generating 4 images with prompt: A photo of sks dog in the snow.
text_encoder\model.safetensors not found
{'requires_safety_checker'} was not found in config. Values will be initialized to default values.
{'class_embed_type', 'resnet_time_scale_shift', 'transformer_layers_per_block', 'conv_out_kernel', 'addition_embed_type', 'class_embeddings_concat', 'num_class_embeds', 'upcast_attention', 'encoder_hid_dim', 'addition_time_embed_dim', 'mid_block_type', 'resnet_out_scale_factor', 'resnet_skip_time_act', 'use_linear_projection', 'dual_cross_attention', 'only_cross_attention', 'time_embedding_dim', 'time_embedding_type', 'addition_embed_type_num_heads', 'cross_attention_norm', 'encoder_hid_dim_type', 'conv_in_kernel', 'projection_class_embeddings_input_dim', 'time_embedding_act_fn', 'mid_block_only_cross_attention', 'timestep_post_act', 'num_attention_heads', 'time_cond_proj_dim'} was not found in config. Values will be initialized to default values.
{'timestep_spacing', 'prediction_type'} was not found in config. Values will be initialized to default values.
{'scaling_factor'} was not found in config. Values will be initialized to default values.
You have disabled the safety checker for <class 'diffusers.pipelines.stable_diffusion.pipeline_stable_diffusion.StableDiffusionPipeline'> by passing `safety_checker=None`. Ensure that you abide to the conditions of the Stable Diffusion license and do not expose unfiltered results in services or applications open to the public. Both the diffusers team and Hugging Face strongly recommend to keep the safety filter enabled in all public facing circumstances, disabling it only for use-cases that involve analyzing network behavior or auditing its results. For more information, please have a look at https://github.com/huggingface/diffusers/pull/254 .
{'variance_type', 'solver_order', 'solver_type', 'dynamic_thresholding_ratio', 'timestep_spacing', 'algorithm_type', 'lambda_min_clipped', 'thresholding', 'sample_max_value', 'prediction_type', 'use_karras_sigmas', 'lower_order_final'} was not found in config. Values will be initialized to default values.
Steps:  12%|████████████████▊                                                                                                                         | 61/500 [00:57<02:20,  3.11it/s, loss=0.00361, lr=0.0001]03/10/2024 19:48:05 - INFO - __main__ - Running validation...
 Generating 4 images with prompt: A photo of sks dog in the snow.
Traceback (most recent call last):
  File "D:\ProgramData\Miniconda3\envs\zero123-hf-new\lib\site-packages\urllib3\connectionpool.py", line 790, in urlopen
    response = self._make_request(
  File "D:\ProgramData\Miniconda3\envs\zero123-hf-new\lib\site-packages\urllib3\connectionpool.py", line 536, in _make_request
    response = conn.getresponse()
  File "D:\ProgramData\Miniconda3\envs\zero123-hf-new\lib\site-packages\urllib3\connection.py", line 461, in getresponse
    httplib_response = super().getresponse()
  File "D:\ProgramData\Miniconda3\envs\zero123-hf-new\lib\http\client.py", line 1377, in getresponse
    response.begin()
  File "D:\ProgramData\Miniconda3\envs\zero123-hf-new\lib\http\client.py", line 320, in begin
    version, status, reason = self._read_status()
  File "D:\ProgramData\Miniconda3\envs\zero123-hf-new\lib\http\client.py", line 289, in _read_status
    raise RemoteDisconnected("Remote end closed connection without"
http.client.RemoteDisconnected: Remote end closed connection without response

The above exception was the direct cause of the following exception:

urllib3.exceptions.ProxyError: ('Unable to connect to proxy', RemoteDisconnected('Remote end closed connection without response'))

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "D:\ProgramData\Miniconda3\envs\zero123-hf-new\lib\site-packages\requests\adapters.py", line 486, in send
    resp = conn.urlopen(
  File "D:\ProgramData\Miniconda3\envs\zero123-hf-new\lib\site-packages\urllib3\connectionpool.py", line 844, in urlopen
    retries = retries.increment(
  File "D:\ProgramData\Miniconda3\envs\zero123-hf-new\lib\site-packages\urllib3\util\retry.py", line 515, in increment
urllib3.exceptions.MaxRetryError: HTTPSConnectionPool(host='huggingface.co', port=443): Max retries exceeded with url: /api/models/runwayml/stable-diffusion-v1-5 (Caused by ProxyError('Unable to connect to proxy', RemoteDisconnected('Remote end closed connection without response')))

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "J:\Research_Program\peft\examples\lora_dreambooth\train_dreambooth.py", line 1103, in <module>
    main(args)
  File "J:\Research_Program\peft\examples\lora_dreambooth\train_dreambooth.py", line 1007, in main
    pipeline = DiffusionPipeline.from_pretrained(
  File "D:\ProgramData\Miniconda3\envs\zero123-hf-new\lib\site-packages\diffusers\pipelines\pipeline_utils.py", line 882, in from_pretrained
    cached_folder = cls.download(
  File "D:\ProgramData\Miniconda3\envs\zero123-hf-new\lib\site-packages\diffusers\pipelines\pipeline_utils.py", line 1185, in download
    info = model_info(
  File "D:\ProgramData\Miniconda3\envs\zero123-hf-new\lib\site-packages\huggingface_hub\utils\_validators.py", line 118, in _inner_fn
    return fn(*args, **kwargs)
  File "D:\ProgramData\Miniconda3\envs\zero123-hf-new\lib\site-packages\huggingface_hub\hf_api.py", line 2219, in model_info
    r = get_session().get(path, headers=headers, timeout=timeout, params=params)
  File "D:\ProgramData\Miniconda3\envs\zero123-hf-new\lib\site-packages\requests\sessions.py", line 602, in get
    return self.request("GET", url, **kwargs)
  File "D:\ProgramData\Miniconda3\envs\zero123-hf-new\lib\site-packages\requests\sessions.py", line 589, in request
    resp = self.send(prep, **send_kwargs)
  File "D:\ProgramData\Miniconda3\envs\zero123-hf-new\lib\site-packages\requests\sessions.py", line 703, in send
    r = adapter.send(request, **kwargs)
  File "D:\ProgramData\Miniconda3\envs\zero123-hf-new\lib\site-packages\huggingface_hub\utils\_http.py", line 67, in send
    return super().send(request, *args, **kwargs)
  File "D:\ProgramData\Miniconda3\envs\zero123-hf-new\lib\site-packages\requests\adapters.py", line 513, in send
    raise ProxyError(e, request=request)
requests.exceptions.ProxyError: (MaxRetryError("HTTPSConnectionPool(host='huggingface.co', port=443): Max retries exceeded with url: /api/models/runwayml/stable-diffusion-v1-5 (Caused by ProxyError('Unable to connect to proxy', RemoteDisconnected('Remote end closed connection without response')))"), '(Request ID: 4e127296-546b-434c-ac27-f7cae4b9bd6c)')
Steps:  12%|████████████████▊                                                                                                                         | 61/500 [00:58<06:59,  1.05it/s, loss=0.00361, lr=0.0001]
Traceback (most recent call last):
  File "D:\ProgramData\Miniconda3\envs\zero123-hf-new\lib\runpy.py", line 197, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "D:\ProgramData\Miniconda3\envs\zero123-hf-new\lib\runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "D:\ProgramData\Miniconda3\envs\zero123-hf-new\Scripts\accelerate.exe\__main__.py", line 7, in <module>
  File "D:\ProgramData\Miniconda3\envs\zero123-hf-new\lib\site-packages\accelerate\commands\accelerate_cli.py", line 47, in main
    args.func(args)
  File "D:\ProgramData\Miniconda3\envs\zero123-hf-new\lib\site-packages\accelerate\commands\launch.py", line 1023, in launch_command
    simple_launcher(args)
  File "D:\ProgramData\Miniconda3\envs\zero123-hf-new\lib\site-packages\accelerate\commands\launch.py", line 643, in simple_launcher
    raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd)
subprocess.CalledProcessError: Command '['D:\\ProgramData\\Miniconda3\\envs\\zero123-hf-new\\python.exe', 'train_dreambooth.py', '--pretrained_model_name_or_path=runwayml/stable-diffusion-v1-5', '--instance_data_dir=./data/dog', '--output_dir=model-lora_fine-tune-dog-2', '--instance_prompt=a photo of sks dog', '--resolution=512', '--train_batch_size=1', '--gradient_accumulation_steps=1', '--checkpointing_steps=100', '--learning_rate=1e-4', '--lr_scheduler=constant', '--lr_warmup_steps=0', '--max_train_steps=500', '--validation_prompt=A photo of sks dog in the snow', '--seed=0', '--report_to=tensorboard', '--use_lora', '--num_dataloader_workers=0', '--no_tracemalloc', '--validation_steps=20']' returned non-zero exit status 1.
younesbelkada commented 7 months ago

Hi @xdobetter Thanks very much for the issue! Looking at the traceback, it looks like the issue is on diffusers side:

  File "J:\Research_Program\peft\examples\lora_dreambooth\train_dreambooth.py", line 1007, in main
    pipeline = DiffusionPipeline.from_pretrained(
  File "D:\ProgramData\Miniconda3\envs\zero123-hf-new\lib\site-packages\diffusers\pipelines\pipeline_utils.py", line 882, in from_pretrained
    cached_folder = cls.download(
  File "D:\ProgramData\Miniconda3\envs\zero123-hf-new\lib\site-packages\diffusers\pipelines\pipeline_utils.py", line 1185, in download
    info = model_info(

That tries to load the model info from the web: https://github.com/huggingface/diffusers/blob/a1cb106459da4c595d22c04e026d7169d8dcfd2b/src/diffusers/pipelines/pipeline_utils.py#L1203 - i think this might be fixed on their latest update. I am going to ping diffusers maintainers and transfer this issue there cc @sayakpaul @yiyixuxu @DN6

sayakpaul commented 7 months ago

Younes is right. It could very well be a network problem too. You should pass a local directory and recheck.

xdobetter commented 7 months ago

Hi @xdobetter Thanks very much for the issue! Looking at the traceback, it looks like the issue is on diffusers side:

  File "J:\Research_Program\peft\examples\lora_dreambooth\train_dreambooth.py", line 1007, in main
    pipeline = DiffusionPipeline.from_pretrained(
  File "D:\ProgramData\Miniconda3\envs\zero123-hf-new\lib\site-packages\diffusers\pipelines\pipeline_utils.py", line 882, in from_pretrained
    cached_folder = cls.download(
  File "D:\ProgramData\Miniconda3\envs\zero123-hf-new\lib\site-packages\diffusers\pipelines\pipeline_utils.py", line 1185, in download
    info = model_info(

That tries to load the model info from the web:

https://github.com/huggingface/diffusers/blob/a1cb106459da4c595d22c04e026d7169d8dcfd2b/src/diffusers/pipelines/pipeline_utils.py#L1203

  • i think this might be fixed on their latest update. I am going to ping diffusers maintainers and transfer this issue there cc @sayakpaul @yiyixuxu @DN6

Thanks for everyone's response. Even though I have fully downloaded the pre-train model, Does it download anything extra when it executes? Or does it still need to stay network connected to Hugging Face.co?

sayakpaul commented 7 months ago

It shouldn't need to stay connected because it's already cached. But it also depends on the cache location.

So, if you have already downloaded the model yourself, it's best to directly pass that path.

xdobetter commented 7 months ago

But the pre-train model is saved to the default cache location, should I give this path as well?

sayakpaul commented 7 months ago

Honestly, it's hard to tell from your configuration. Being unable to reproduce the error makes it more complicated.

So, I would recommend either of the two:

  1. Download the pre-trained checkpoint locally and pass its path while launching training if you don't want internet connection to be alive.
  2. If internet connection is allowed then directly pass "runwayml/stable-diffusion-v1-5".

Does this make sense?

xdobetter commented 7 months ago

I see, thank you for your patient and sincere reply!

github-actions[bot] commented 6 months ago

This issue has been automatically marked as stale because it has not had recent activity. If you think this still needs to be addressed please comment on this thread.

Please note that issues that do not follow the contributing guidelines are likely to be ignored.