returned non-zero exit status 1? Anyone know how and why this happens?

Joedog6503 commented 11 months ago

loading model for process 0/1 load Diffusers pretrained models: runwayml/stable-diffusion-v1-5, variant=None Fetching 11 files: 100%|███████████████████████████████████████████████████████████████| 11/11 [00:00<00:00, 72.44it/s] Loading pipeline components...: 100%|████████████████████████████████████████████████████| 5/5 [00:00<00:00, 6.93it/s] Traceback (most recent call last): File "E:\GitHub\kohya_ss\sdxl_train_network.py", line 189, in trainer.train(args) File "E:\GitHub\kohya_ss\train_network.py", line 234, in train model_version, text_encoder, vae, unet = self.load_target_model(args, weight_dtype, accelerator) File "E:\GitHub\kohya_ss\sdxl_train_network.py", line 47, in load_target_model ) = sdxl_train_util.load_target_model(args, accelerator, sdxl_model_util.MODEL_VERSION_SDXL_BASE_V1_0, weight_dtype) File "E:\GitHub\kohya_ss\library\sdxl_train_util.py", line 34, in load_target_model ) = _load_target_model( File "E:\GitHub\kohya_ss\library\sdxl_train_util.py", line 103, in _load_target_model if text_encoder2.dtype != torch.float32: AttributeError: 'NoneType' object has no attribute 'dtype' Traceback (most recent call last): File "C:\Users\josep\AppData\Local\Programs\Python\Python310\lib\runpy.py", line 196, in _run_module_as_main return _run_code(code, main_globals, None, File "C:\Users\josep\AppData\Local\Programs\Python\Python310\lib\runpy.py", line 86, in _run_code exec(code, run_globals) File "E:\GitHub\kohya_ss\venv\Scripts\accelerate.exe__main__.py", line 7, in File "E:\GitHub\kohya_ss\venv\lib\site-packages\accelerate\commands\accelerate_cli.py", line 47, in main args.func(args) File "E:\GitHub\kohya_ss\venv\lib\site-packages\accelerate\commands\launch.py", line 986, in launch_command simple_launcher(args) File "E:\GitHub\kohya_ss\venv\lib\site-packages\accelerate\commands\launch.py", line 628, in simple_launcher raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd) subprocess.CalledProcessError: Command '['E:\GitHub\kohya_ss\venv\Scripts\python.exe', './sdxl_train_network.py', '--enable_bucket', '--min_bucket_reso=256', '--max_bucket_reso=2048', '--pretrained_model_name_or_path=runwayml/stable-diffusion-v1-5', '--train_data_dir=E:\GitHub\training\Merial', '--resolution=1024,1024', '--output_dir=C:\Users\josep\Desktop\Lora', '--logging_dir=C:\Users\josep\Desktop\Lora', '--network_alpha=32', '--training_comment=3 repeats. More info: https://civitai.com/articles/1771', '--save_model_as=safetensors', '--network_module=networks.lora', '--text_encoder_lr=3e-05', '--unet_lr=3e-05', '--network_dim=32', '--output_name=Merial10', '--lr_scheduler_num_cycles=50', '--no_half_vae', '--learning_rate=3e-05', '--lr_scheduler=constant', '--train_batch_size=3', '--max_train_steps=1667', '--save_every_n_epochs=1', '--mixed_precision=bf16', '--save_precision=bf16', '--caption_extension=.txt', '--cache_latents', '--cache_latents_to_disk', '--optimizer_type=AdamW', '--max_grad_norm=1', '--max_train_epochs=50', '--max_data_loader_n_workers=0', '--caption_dropout_rate=0.05', '--bucket_reso_steps=64', '--min_snr_gamma=5', '--gradient_checkpointing', '--xformers', '--noise_offset=0.0']' returned non-zero exit status 1.

DKnight54 commented 10 months ago

Er.... Because you are loading stable diffusion 1.5 model instead of Stable Diffusion XL model for training?

If you want to train stable diffusion 1.5, try the train_network.py instead

Joedog6503 commented 10 months ago

Noted, tried, failed

[Dataset 0] loading image sizes. 100%|██████████████████████████████████████████████████████████████████████████████████| 10/10 [00:00<00:00, 87.70it/s] make buckets number of images (including repeats) / 各bucketの画像枚数（繰り返し回数を含む） bucket 0: resolution (640, 1600), count: 20 bucket 1: resolution (704, 1408), count: 10 bucket 2: resolution (768, 1280), count: 20 bucket 3: resolution (832, 1216), count: 30 bucket 4: resolution (896, 1152), count: 10 bucket 5: resolution (1024, 1024), count: 10 mean ar error (without repeats): 0.02071550377769773 preparing accelerator loading model for process 0/1 load Diffusers pretrained models: stabilityai/stable-diffusion-xl-refiner-1.0, variant=fp16 model_index.json: 100%|███████████████████████████████████████████████████████████████████████| 612/612 [00:00<?, ?B/s] scheduler/scheduler_config.json: 100%|█████████████████████████████████████████████████| 479/479 [00:00<00:00, 958kB/s] tokenizer_2/special_tokens_map.json: 100%|████████████████████████████████████████████████████| 460/460 [00:00<?, ?B/s] text_encoder_2/config.json: 100%|█████████████████████████████████████████████████████████████| 575/575 [00:00<?, ?B/s] unet/config.json: 100%|███████████████████████████████████████████████████████████| 1.71k/1.71k [00:00<00:00, 1.71MB/s] tokenizer_2/tokenizer_config.json: 100%|██████████████████████████████████████████████| 725/725 [00:00<00:00, 1.45MB/s] vae/config.json: 100%|████████████████████████████████████████████████████████████████████████| 642/642 [00:00<?, ?B/s] tokenizer_2/vocab.json: 100%|█████████████████████████████████████████████████████| 1.06M/1.06M [00:00<00:00, 6.33MB/s] tokenizer_2/merges.txt: 100%|███████████████████████████████████████████████████████| 525k/525k [00:00<00:00, 2.58MB/s] diffusion_pytorch_model.fp16.safetensors: 100%|█████████████████████████████████████| 167M/167M [00:25<00:00, 6.54MB/s] diffusion_pytorch_model.fp16.safetensors: 100%|█████████████████████████████████████| 167M/167M [00:38<00:00, 4.34MB/s] model.fp16.safetensors: 100%|█████████████████████████████████████████████████████| 1.39G/1.39G [03:00<00:00, 7.69MB/s] diffusion_pytorch_model.fp16.safetensors: 100%|███████████████████████████████████| 4.52G/4.52G [07:17<00:00, 10.3MB/s] Fetching 13 files: 100%|███████████████████████████████████████████████████████████████| 13/13 [07:17<00:00, 33.67s/it] Loading pipeline components...: 100%|████████████████████████████████████████████████████| 5/5 [00:06<00:00, 1.24s/it] Traceback (most recent call last): File "E:\GitHub\kohya_ss\sdxl_train_network.py", line 189, in trainer.train(args) File "E:\GitHub\kohya_ss\train_network.py", line 234, in train model_version, text_encoder, vae, unet = self.load_target_model(args, weight_dtype, accelerator) File "E:\GitHub\kohya_ss\sdxl_train_network.py", line 47, in load_target_model ) = sdxl_train_util.load_target_model(args, accelerator, sdxl_model_util.MODEL_VERSION_SDXL_BASE_V1_0, weight_dtype) File "E:\GitHub\kohya_ss\library\sdxl_train_util.py", line 34, in load_target_model ) = _load_target_model( File "E:\GitHub\kohya_ss\library\sdxl_train_util.py", line 101, in _load_target_model if text_encoder1.dtype != torch.float32: AttributeError: 'NoneType' object has no attribute 'dtype' Traceback (most recent call last): File "C:\Users\josep\AppData\Local\Programs\Python\Python310\lib\runpy.py", line 196, in _run_module_as_main return _run_code(code, main_globals, None, File "C:\Users\josep\AppData\Local\Programs\Python\Python310\lib\runpy.py", line 86, in _run_code exec(code, run_globals) File "E:\GitHub\kohya_ss\venv\Scripts\accelerate.exe__main__.py", line 7, in File "E:\GitHub\kohya_ss\venv\lib\site-packages\accelerate\commands\accelerate_cli.py", line 47, in main args.func(args) File "E:\GitHub\kohya_ss\venv\lib\site-packages\accelerate\commands\launch.py", line 986, in launch_command simple_launcher(args) File "E:\GitHub\kohya_ss\venv\lib\site-packages\accelerate\commands\launch.py", line 628, in simple_launcher raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd) subprocess.CalledProcessError: Command '['E:\GitHub\kohya_ss\venv\Scripts\python.exe', './sdxl_train_network.py', '--enable_bucket', '--min_bucket_reso=256', '--max_bucket_reso=2048', '--pretrained_model_name_or_path=stabilityai/stable-diffusion-xl-refiner-1.0', '--train_data_dir=E:\GitHub\training\Merial', '--resolution=1024,1024', '--output_dir=C:\Users\josep\Desktop\Lora', '--logging_dir=C:\Users\josep\Desktop\Lora', '--network_alpha=32', '--training_comment=3 repeats. More info: https://civitai.com/articles/1771', '--save_model_as=safetensors', '--network_module=networks.lora', '--text_encoder_lr=3e-05', '--unet_lr=3e-05', '--network_dim=32', '--output_name=Merial12', '--lr_scheduler_num_cycles=50', '--no_half_vae', '--learning_rate=3e-05', '--lr_scheduler=constant', '--train_batch_size=3', '--max_train_steps=1667', '--save_every_n_epochs=1', '--mixed_precision=fp16', '--save_precision=fp16', '--caption_extension=.txt', '--cache_latents', '--cache_latents_to_disk', '--optimizer_type=AdamW', '--max_grad_norm=1', '--max_train_epochs=50', '--max_data_loader_n_workers=0', '--caption_dropout_rate=0.05', '--bucket_reso_steps=64', '--min_snr_gamma=5', '--gradient_checkpointing', '--xformers', '--noise_offset=0.0']' returned non-zero exit status 1.

DKnight54 commented 10 months ago

Hrm. I'm not sure if it's possible to train the refiner model. Most training examples I've seen are training on SDXL base model

DKnight54 commented 10 months ago

What's happening appears to be that the script is expecting the model to have 2 text encoders, which as far as I know, only SDXL base model and those trained on top of it have.

DKnight54 commented 10 months ago

And it's throwing the error because it cannot find the second text encoder

Joedog6503 commented 10 months ago

thanks for the help but

[Dataset 0] loading image sizes. 100%|█████████████████████████████████████████████████████████████████████████████████| 10/10 [00:00<00:00, 143.86it/s] make buckets number of images (including repeats) / 各bucketの画像枚数（繰り返し回数を含む） bucket 0: resolution (640, 1600), count: 20 bucket 1: resolution (704, 1408), count: 10 bucket 2: resolution (768, 1280), count: 20 bucket 3: resolution (832, 1216), count: 30 bucket 4: resolution (896, 1152), count: 10 bucket 5: resolution (1024, 1024), count: 10 mean ar error (without repeats): 0.02071550377769773 preparing accelerator loading model for process 0/1 load StableDiffusion checkpoint: E:/GitHub/stable-diffusion-webui/models/Stable-diffusion/sd_xl_base_1.0.safetensors building U-Net loading U-Net from checkpoint U-Net: building text encoders loading text encoders from checkpoint text encoder 1: text encoder 2: building VAE loading VAE from checkpoint VAE: Enable xformers for U-Net A matching Triton is not available, some optimizations will not be enabled. Error caught was: No module named 'triton' import network module: networks.lora [Dataset 0] caching latents. checking cache validity... 100%|█████████████████████████████████████████████████████████████████████████████████| 10/10 [00:00<00:00, 111.71it/s] caching latents... 100%|██████████████████████████████████████████████████████████████████████████████████| 10/10 [00:04<00:00, 2.48it/s] create LoRA network. base dim (rank): 32, alpha: 32.0 neuron dropout: p=None, rank dropout: p=None, module dropout: p=None create LoRA for Text Encoder 1: create LoRA for Text Encoder 2: create LoRA for Text Encoder: 264 modules. create LoRA for U-Net: 722 modules. enable LoRA for text encoder enable LoRA for U-Net prepare optimizer, data loader etc. use AdamW optimizer | {} override steps. steps for 50 epochs is / 指定エポックまでのステップ数: 1800 running training / 学習開始 num train images * repeats / 学習画像の数×繰り返し回数: 100 num reg images / 正則化画像の数: 0 num batches per epoch / 1epochのバッチ数: 36 num epochs / epoch数: 50 batch size per device / バッチサイズ: 3 gradient accumulation steps / 勾配を合計するステップ数 = 1 total optimization steps / 学習ステップ数: 1800 steps: 0%| | 0/1800 [00:00<?, ?it/s] epoch 1/50 steps: 0%| | 1/1800 [02:56<88:15:43, 176.62s/it, avr_loss=0.0524]Traceback (most recent call last): File "E:\GitHub\kohya_ss\sdxl_train_network.py", line 189, in trainer.train(args) File "E:\GitHub\kohya_ss\train_network.py", line 738, in train for step, batch in enumerate(train_dataloader): File "E:\GitHub\kohya_ss\venv\lib\site-packages\accelerate\data_loader.py", line 394, in iter next_batch = next(dataloader_iter) File "E:\GitHub\kohya_ss\venv\lib\site-packages\torch\utils\data\dataloader.py", line 633, in next data = self._next_data() File "E:\GitHub\kohya_ss\venv\lib\site-packages\torch\utils\data\dataloader.py", line 677, in _next_data data = self._dataset_fetcher.fetch(index) # may raise StopIteration File "E:\GitHub\kohya_ss\venv\lib\site-packages\torch\utils\data_utils\fetch.py", line 51, in fetch data = [self.dataset[idx] for idx in possibly_batched_index] File "E:\GitHub\kohya_ss\venv\lib\site-packages\torch\utils\data_utils\fetch.py", line 51, in data = [self.dataset[idx] for idx in possibly_batched_index] File "E:\GitHub\kohya_ss\venv\lib\site-packages\torch\utils\data\dataset.py", line 243, in getitem return self.datasets[dataset_idx][sample_idx] File "E:\GitHub\kohya_ss\library\train_util.py", line 1267, in getitem example["latents"] = torch.stack(latents_list) if latents_list[0] is not None else None RuntimeError: stack expects each tensor to be equal size, but got [4, 152, 104] at entry 0 and [4, 128, 128] at entry 2 steps: 0%| | 1/1800 [02:57<88:33:30, 177.22s/it, avr_loss=0.0524] Traceback (most recent call last): File "C:\Users\josep\AppData\Local\Programs\Python\Python310\lib\runpy.py", line 196, in _run_module_as_main return _run_code(code, main_globals, None, File "C:\Users\josep\AppData\Local\Programs\Python\Python310\lib\runpy.py", line 86, in _run_code exec(code, run_globals) File "E:\GitHub\kohya_ss\venv\Scripts\accelerate.exe__main__.py", line 7, in File "E:\GitHub\kohya_ss\venv\lib\site-packages\accelerate\commands\accelerate_cli.py", line 47, in main args.func(args) File "E:\GitHub\kohya_ss\venv\lib\site-packages\accelerate\commands\launch.py", line 986, in launch_command simple_launcher(args) File "E:\GitHub\kohya_ss\venv\lib\site-packages\accelerate\commands\launch.py", line 628, in simple_launcher raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd) subprocess.CalledProcessError: Command '['E:\GitHub\kohya_ss\venv\Scripts\python.exe', './sdxl_train_network.py', '--enable_bucket', '--min_bucket_reso=256', '--max_bucket_reso=2048', '--pretrained_model_name_or_path=E:/GitHub/stable-diffusion-webui/models/Stable-diffusion/sd_xl_base_1.0.safetensors', '--train_data_dir=E:\GitHub\training\Merial', '--resolution=1024,1024', '--output_dir=C:\Users\josep\Desktop\Lora', '--logging_dir=C:\Users\josep\Desktop\Lora', '--network_alpha=32', '--training_comment=3 repeats. More info: https://civitai.com/articles/1771', '--save_model_as=safetensors', '--network_module=networks.lora', '--text_encoder_lr=3e-05', '--unet_lr=3e-05', '--network_dim=32', '--output_name=Merial12', '--lr_scheduler_num_cycles=50', '--no_half_vae', '--learning_rate=3e-05', '--lr_scheduler=constant', '--train_batch_size=3', '--max_train_steps=1667', '--save_every_n_epochs=1', '--mixed_precision=fp16', '--save_precision=fp16', '--caption_extension=.txt', '--cache_latents', '--cache_latents_to_disk', '--optimizer_type=AdamW', '--max_grad_norm=1', '--max_train_epochs=50', '--max_data_loader_n_workers=0', '--caption_dropout_rate=0.05', '--bucket_reso_steps=64', '--min_snr_gamma=5', '--gradient_checkpointing', '--xformers', '--noise_offset=0.0']' returned non-zero exit status 1.

Joedog6503 commented 10 months ago

.......... if it was just the model, i would have closed the post

DKnight54 commented 10 months ago

That's interesting, the model is loading fine, but is throwing another error about latent size mismatch. Can you try adding --full_fp16 to your arguements?

If that doesn't work, then will need someone with more knowledge of the code to vome take a look.

DKnight54 commented 10 months ago

To clarify, "returned non-zero exit status 1." just means that the script had an error and did not complete successfully. To figure out what went wrong, we have to read the stack trace (the part where it says traceback and then a whole bunch of code)

Joedog6503 commented 10 months ago

i closed some programes and the like and found out the prior error was likely due to low vram.... at 12gbs....

Existing file: E:\GitHub\training\Merial\10_Merial\Merial (1).jpg Current file: E:\GitHub\training\Merial\10_Merial\Merial (1).webp 05:08:50-718874 INFO Valid image folder names found in: E:\GitHub\training\Merial 05:08:50-720375 INFO Folder 10_Merial: 10 images found 05:08:50-721376 INFO Folder 10_Merial: 100 steps 05:08:50-722375 INFO Total steps: 100 05:08:50-723376 INFO Train batch size: 3 05:08:50-723876 INFO Gradient accumulation steps: 1 05:08:50-725377 INFO Epoch: 50 05:08:50-726876 INFO Regulatization factor: 1 05:08:50-727876 INFO max_train_steps (100 / 3 / 1 50 1) = 1667 05:08:50-730376 INFO stop_text_encoder_training = 0 05:08:50-731377 INFO lr_warmup_steps = 0 05:08:50-732377 INFO Saving training config to C:\Users\josep\Desktop\Lora\Merial12_20240108-050850.json... 05:08:50-734377 INFO accelerate launch --num_cpu_threads_per_process=2 "./sdxl_train_network.py" --enable_bucket --min_bucket_reso=256 --max_bucket_reso=2048 --pretrained_model_name_or_path="E:/GitHub/stable-diffusion-webui/models/Stable-diffusion /sd_xl_base_1.0.safetensors" --train_data_dir="E:\GitHub\training\Merial" --resolution="1024,1024" --output_dir="C:\Users\josep\Desktop\Lora" --logging_dir="C:\Users\josep\Desktop\Lora" --network_alpha="32" --training_comment="3 repeats. More info: https://civitai.com/articles/1771" --save_model_as=safetensors --network_module=networks.lora --text_encoder_lr=3e-05 --unet_lr=3e-05 --network_dim=32 --output_name="Merial12" --lr_scheduler_num_cycles="50" --no_half_vae --learning_rate="3e-05" --lr_scheduler="constant" --train_batch_size="3" --max_train_steps="1667" --save_every_n_epochs="1" --mixed_precision="fp16" --save_precision="fp16" --caption_extension=".txt" --cache_latents --cache_latents_to_disk --optimizer_type="AdamW" --max_grad_norm="1" --max_train_epochs=50 --max_data_loader_n_workers="0" --caption_dropout_rate="0.05" --bucket_reso_steps=64 --min_snr_gamma=5 --gradient_checkpointing --xformers --noise_offset=0.0 prepare tokenizers Using DreamBooth method. prepare images. found directory E:\GitHub\training\Merial\10_Merial contains 10 image files 100 train images with repeating. 0 reg images. no regularization images / 正則化画像が見つかりませんでした [Dataset 0] batch_size: 3 resolution: (1024, 1024) enable_bucket: True min_bucket_reso: 256 max_bucket_reso: 2048 bucket_reso_steps: 64 bucket_no_upscale: False

[Subset 0 of Dataset 0] image_dir: "E:\GitHub\training\Merial\10_Merial" image_count: 10 num_repeats: 10 shuffle_caption: False keep_tokens: 0 keep_tokens_separator: caption_dropout_rate: 0.05 caption_dropout_every_n_epoches: 0 caption_tag_dropout_rate: 0.0 caption_prefix: None caption_suffix: None color_aug: False flip_aug: False face_crop_aug_range: None random_crop: False token_warmup_min: 1, token_warmup_step: 0, is_reg: False class_tokens: Merial caption_extension: .txt

[Dataset 0] loading image sizes. 100%|████████████████████████████████████████████████████████████████████████████| 10/10 [00:00<00:00, 88.09it/s] make buckets number of images (including repeats) / 各bucketの画像枚数（繰り返し回数を含む） bucket 0: resolution (640, 1600), count: 20 bucket 1: resolution (704, 1408), count: 10 bucket 2: resolution (768, 1280), count: 20 bucket 3: resolution (832, 1216), count: 30 bucket 4: resolution (896, 1152), count: 10 bucket 5: resolution (1024, 1024), count: 10 mean ar error (without repeats): 0.02071550377769773 preparing accelerator loading model for process 0/1 load StableDiffusion checkpoint: E:/GitHub/stable-diffusion-webui/models/Stable-diffusion/sd_xl_base_1.0.safetensors building U-Net loading U-Net from checkpoint U-Net: building text encoders loading text encoders from checkpoint text encoder 1: text encoder 2: building VAE loading VAE from checkpoint VAE: Enable xformers for U-Net A matching Triton is not available, some optimizations will not be enabled. Error caught was: No module named 'triton' import network module: networks.lora [Dataset 0] caching latents. checking cache validity... 100%|███████████████████████████████████████████████████████████████████████████| 10/10 [00:00<00:00, 170.91it/s] caching latents... 100%|██████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00, 1.72it/s] create LoRA network. base dim (rank): 32, alpha: 32.0 neuron dropout: p=None, rank dropout: p=None, module dropout: p=None create LoRA for Text Encoder 1: create LoRA for Text Encoder 2: create LoRA for Text Encoder: 264 modules. create LoRA for U-Net: 722 modules. enable LoRA for text encoder enable LoRA for U-Net prepare optimizer, data loader etc. use AdamW optimizer | {} override steps. steps for 50 epochs is / 指定エポックまでのステップ数: 1800 running training / 学習開始 num train images * repeats / 学習画像の数×繰り返し回数: 100 num reg images / 正則化画像の数: 0 num batches per epoch / 1epochのバッチ数: 36 num epochs / epoch数: 50 batch size per device / バッチサイズ: 3 gradient accumulation steps / 勾配を合計するステップ数 = 1 total optimization steps / 学習ステップ数: 1800 steps: 0%| | 0/1800 [00:00<?, ?it/s] epoch 1/50 steps: 0%| | 3/1800 [01:36<16:02:53, 32.15s/it, avr_loss=0.0673]

and will now take close to a day

Joedog6503 commented 10 months ago

05:13:40-812875 INFO Valid image folder names found in: E:\GitHub\training\Merial 05:13:40-813874 INFO Folder 10_Merial: 10 images found 05:13:40-816875 INFO Folder 10_Merial: 100 steps 05:13:40-817875 INFO Total steps: 100 05:13:40-818875 INFO Train batch size: 2 05:13:40-819876 INFO Gradient accumulation steps: 1 05:13:40-820877 INFO Epoch: 15 05:13:40-822376 INFO Regulatization factor: 1 05:13:40-823376 INFO max_train_steps (100 / 2 / 1 15 1) = 750 05:13:40-824878 INFO stop_text_encoder_training = 0 05:13:40-826377 INFO lr_warmup_steps = 0 05:13:40-827377 INFO Saving training config to C:\Users\josep\Desktop\Lora\Merial13_20240108-051340.json... 05:13:40-828878 INFO accelerate launch --num_cpu_threads_per_process=2 "./train_network.py" --enable_bucket --min_bucket_reso=256 --max_bucket_reso=2048 --pretrained_model_name_or_path="runwayml/stable-diffusion-v1-5" --train_data_dir="E:\GitHub\training\Merial" --resolution="512,512" --output_dir="C:\Users\josep\Desktop\Lora" --logging_dir="C:\Users\josep\Desktop\Lora" --network_alpha="128" --save_model_as=safetensors --network_module=networks.lora --text_encoder_lr=5e-05 --unet_lr=0.00015 --network_dim=128 --output_name="Merial13" --lr_scheduler_num_cycles="15" --no_half_vae --learning_rate="0.00015" --lr_scheduler="cosine" --train_batch_size="2" --max_train_steps="750" --save_every_n_epochs="5" --mixed_precision="bf16" --save_precision="bf16" --caption_extension=".txt" --cache_latents --optimizer_type="AdamW8bit" --max_grad_norm="1" --max_data_loader_n_workers="0" --max_token_length=225 --clip_skip=2 --bucket_reso_steps=64 --xformers --bucket_no_upscale --noise_offset=0.0 prepare tokenizer update token length: 225 Using DreamBooth method. prepare images. found directory E:\GitHub\training\Merial\10_Merial contains 10 image files 100 train images with repeating. 0 reg images. no regularization images / 正則化画像が見つかりませんでした [Dataset 0] batch_size: 2 resolution: (512, 512) enable_bucket: True min_bucket_reso: 256 max_bucket_reso: 2048 bucket_reso_steps: 64 bucket_no_upscale: True

[Subset 0 of Dataset 0] image_dir: "E:\GitHub\training\Merial\10_Merial" image_count: 10 num_repeats: 10 shuffle_caption: False keep_tokens: 0 keep_tokens_separator: caption_dropout_rate: 0.0 caption_dropout_every_n_epoches: 0 caption_tag_dropout_rate: 0.0 caption_prefix: None caption_suffix: None color_aug: False flip_aug: False face_crop_aug_range: None random_crop: False token_warmup_min: 1, token_warmup_step: 0, is_reg: False class_tokens: Merial caption_extension: .txt

[Dataset 0] loading image sizes. 100%|███████████████████████████████████████████████████████████████████████████| 10/10 [00:00<00:00, 139.83it/s] make buckets min_bucket_reso and max_bucket_reso are ignored if bucket_no_upscale is set, because bucket reso is defined by image size automatically / bucket_no_upscaleが指定された場合は、bucketの解像度は画像サイズから自動計算されるため、min_bucket_resoとmax_bucket_resoは無視されます number of images (including repeats) / 各bucketの画像枚数（繰り返し回数を含む） bucket 0: resolution (128, 256), count: 20 bucket 1: resolution (192, 384), count: 10 bucket 2: resolution (256, 384), count: 10 bucket 3: resolution (256, 576), count: 20 bucket 4: resolution (320, 512), count: 10 bucket 5: resolution (384, 512), count: 10 bucket 6: resolution (384, 576), count: 10 bucket 7: resolution (512, 512), count: 10 mean ar error (without repeats): 0.06391376824093384 preparing accelerator loading model for process 0/1 load Diffusers pretrained models: runwayml/stable-diffusion-v1-5 Loading pipeline components...: 100%|██████████████████████████████████████████████| 5/5 [00:00<00:00, 7.01it/s] You have disabled the safety checker for <class 'diffusers.pipelines.stable_diffusion.pipeline_stable_diffusion.StableDiffusionPipeline'> by passing safety_checker=None. Ensure that you abide to the conditions of the Stable Diffusion license and do not expose unfiltered results in services or applications open to the public. Both the diffusers team and Hugging Face strongly recommend to keep the safety filter enabled in all public facing circumstances, disabling it only for use-cases that involve analyzing network behavior or auditing its results. For more information, please have a look at https://github.com/huggingface/diffusers/pull/254 . UNet2DConditionModel: 64, 8, 768, False, False U-Net converted to original U-Net Enable xformers for U-Net A matching Triton is not available, some optimizations will not be enabled. Error caught was: No module named 'triton' import network module: networks.lora [Dataset 0] caching latents. checking cache validity... 100%|████████████████████████████████████████████████████████████████████████████████████| 10/10 [00:00<?, ?it/s] caching latents... 100%|████████████████████████████████████████████████████████████████████████████| 10/10 [00:00<00:00, 13.69it/s] create LoRA network. base dim (rank): 128, alpha: 128.0 neuron dropout: p=None, rank dropout: p=None, module dropout: p=None create LoRA for Text Encoder: create LoRA for Text Encoder: 72 modules. create LoRA for U-Net: 192 modules. enable LoRA for text encoder enable LoRA for U-Net prepare optimizer, data loader etc. Traceback (most recent call last): File "E:\GitHub\kohya_ss\library\train_util.py", line 3480, in get_optimizer import bitsandbytes as bnb File "E:\GitHub\kohya_ss\venv\lib\site-packages\bitsandbytes__init.py", line 6, in from . import cuda_setup, utils, research File "E:\GitHub\kohya_ss\venv\lib\site-packages\bitsandbytes\research__init__.py", line 1, in from . import nn File "E:\GitHub\kohya_ss\venv\lib\site-packages\bitsandbytes\research\nn\init.py", line 1, in from .modules import LinearFP8Mixed, LinearFP8Global File "E:\GitHub\kohya_ss\venv\lib\site-packages\bitsandbytes\research\nn\modules.py", line 8, in from bitsandbytes.optim import GlobalOptimManager File "E:\GitHub\kohya_ss\venv\lib\site-packages\bitsandbytes\optim\init__.py", line 6, in from bitsandbytes.cextension import COMPILED_WITH_CUDA File "E:\GitHub\kohya_ss\venv\lib\site-packages\bitsandbytes\cextension.py", line 5, in from .cuda_setup.main import evaluate_cuda_setup File "E:\GitHub\kohya_ss\venv\lib\site-packages\bitsandbytes\cuda_setup\main.py", line 21, in from .paths import determine_cuda_runtime_lib_path ModuleNotFoundError: No module named 'bitsandbytes.cuda_setup.paths'

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "E:\GitHub\kohya_ss\train_network.py", line 996, in trainer.train(args) File "E:\GitHub\kohya_ss\train_network.py", line 348, in train optimizer_name, optimizer_args, optimizer = train_util.get_optimizer(args, trainable_params) File "E:\GitHub\kohya_ss\library\train_util.py", line 3482, in get_optimizer raise ImportError("No bitsandbytes / bitsandbytesがインストールされていないようです") ImportError: No bitsandbytes / bitsandbytesがインストールされていないようです Traceback (most recent call last): File "C:\Users\josep\AppData\Local\Programs\Python\Python310\lib\runpy.py", line 196, in _run_module_as_main return _run_code(code, main_globals, None, File "C:\Users\josep\AppData\Local\Programs\Python\Python310\lib\runpy.py", line 86, in _run_code exec(code, run_globals) File "E:\GitHub\kohya_ss\venv\Scripts\accelerate.exe__main__.py", line 7, in File "E:\GitHub\kohya_ss\venv\lib\site-packages\accelerate\commands\accelerate_cli.py", line 47, in main args.func(args) File "E:\GitHub\kohya_ss\venv\lib\site-packages\accelerate\commands\launch.py", line 986, in launch_command simple_launcher(args) File "E:\GitHub\kohya_ss\venv\lib\site-packages\accelerate\commands\launch.py", line 628, in simple_launcher raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd) subprocess.CalledProcessError: Command '['E:\GitHub\kohya_ss\venv\Scripts\python.exe', './train_network.py', '--enable_bucket', '--min_bucket_reso=256', '--max_bucket_reso=2048', '--pretrained_model_name_or_path=runwayml/stable-diffusion-v1-5', '--train_data_dir=E:\GitHub\training\Merial', '--resolution=512,512', '--output_dir=C:\Users\josep\Desktop\Lora', '--logging_dir=C:\Users\josep\Desktop\Lora', '--network_alpha=128', '--save_model_as=safetensors', '--network_module=networks.lora', '--text_encoder_lr=5e-05', '--unet_lr=0.00015', '--network_dim=128', '--output_name=Merial13', '--lr_scheduler_num_cycles=15', '--no_half_vae', '--learning_rate=0.00015', '--lr_scheduler=cosine', '--train_batch_size=2', '--max_train_steps=750', '--save_every_n_epochs=5', '--mixed_precision=bf16', '--save_precision=bf16', '--caption_extension=.txt', '--cache_latents', '--optimizer_type=AdamW8bit', '--max_grad_norm=1', '--max_data_loader_n_workers=0', '--max_token_length=225', '--clip_skip=2', '--bucket_reso_steps=64', '--xformers', '--bucket_no_upscale', '--noise_offset=0.0']' returned non-zero exit status 1.

Another error when i use AdamW8bit

Joedog6503 commented 10 months ago

Current file: E:\GitHub\training\Merial\10_Merial\Merial (1).webp 05:08:50-718874 INFO Valid image folder names found in: E:\GitHub\training\Merial 05:08:50-720375 INFO Folder 10_Merial: 10 images found 05:08:50-721376 INFO Folder 10_Merial: 100 steps 05:08:50-722375 INFO Total steps: 100 05:08:50-723376 INFO Train batch size: 3 05:08:50-723876 INFO Gradient accumulation steps: 1 05:08:50-725377 INFO Epoch: 50 05:08:50-726876 INFO Regulatization factor: 1 05:08:50-727876 INFO max_train_steps (100 / 3 / 1 50 1) = 1667 05:08:50-730376 INFO stop_text_encoder_training = 0 05:08:50-731377 INFO lr_warmup_steps = 0 05:08:50-732377 INFO Saving training config to C:\Users\josep\Desktop\Lora\Merial12_20240108-050850.json... 05:08:50-734377 INFO accelerate launch --num_cpu_threads_per_process=2 "./sdxl_train_network.py" --enable_bucket --min_bucket_reso=256 --max_bucket_reso=2048 --pretrained_model_name_or_path="E:/GitHub/stable-diffusion-webui/models/Stable-diffusion /sd_xl_base_1.0.safetensors" --train_data_dir="E:\GitHub\training\Merial" --resolution="1024,1024" --output_dir="C:\Users\josep\Desktop\Lora" --logging_dir="C:\Users\josep\Desktop\Lora" --network_alpha="32" --training_comment="3 repeats. More info: https://civitai.com/articles/1771" --save_model_as=safetensors --network_module=networks.lora --text_encoder_lr=3e-05 --unet_lr=3e-05 --network_dim=32 --output_name="Merial12" --lr_scheduler_num_cycles="50" --no_half_vae --learning_rate="3e-05" --lr_scheduler="constant" --train_batch_size="3" --max_train_steps="1667" --save_every_n_epochs="1" --mixed_precision="fp16" --save_precision="fp16" --caption_extension=".txt" --cache_latents --cache_latents_to_disk --optimizer_type="AdamW" --max_grad_norm="1" --max_train_epochs=50 --max_data_loader_n_workers="0" --caption_dropout_rate="0.05" --bucket_reso_steps=64 --min_snr_gamma=5 --gradient_checkpointing --xformers --noise_offset=0.0 prepare tokenizers Using DreamBooth method. prepare images. found directory E:\GitHub\training\Merial\10_Merial contains 10 image files 100 train images with repeating. 0 reg images. no regularization images / 正則化画像が見つかりませんでした [Dataset 0] batch_size: 3 resolution: (1024, 1024) enable_bucket: True min_bucket_reso: 256 max_bucket_reso: 2048 bucket_reso_steps: 64 bucket_no_upscale: False

[Subset 0 of Dataset 0] image_dir: "E:\GitHub\training\Merial\10_Merial" image_count: 10 num_repeats: 10 shuffle_caption: False keep_tokens: 0 keep_tokens_separator: caption_dropout_rate: 0.05 caption_dropout_every_n_epoches: 0 caption_tag_dropout_rate: 0.0 caption_prefix: None caption_suffix: None color_aug: False flip_aug: False face_crop_aug_range: None random_crop: False token_warmup_min: 1, token_warmup_step: 0, is_reg: False class_tokens: Merial caption_extension: .txt

[Dataset 0] loading image sizes. 100%|████████████████████████████████████████████████████████████████████████████| 10/10 [00:00<00:00, 88.09it/s] make buckets number of images (including repeats) / 各bucketの画像枚数（繰り返し回数を含む） bucket 0: resolution (640, 1600), count: 20 bucket 1: resolution (704, 1408), count: 10 bucket 2: resolution (768, 1280), count: 20 bucket 3: resolution (832, 1216), count: 30 bucket 4: resolution (896, 1152), count: 10 bucket 5: resolution (1024, 1024), count: 10 mean ar error (without repeats): 0.02071550377769773 preparing accelerator loading model for process 0/1 load StableDiffusion checkpoint: E:/GitHub/stable-diffusion-webui/models/Stable-diffusion/sd_xl_base_1.0.safetensors building U-Net loading U-Net from checkpoint U-Net: building text encoders loading text encoders from checkpoint text encoder 1: text encoder 2: building VAE loading VAE from checkpoint VAE: Enable xformers for U-Net A matching Triton is not available, some optimizations will not be enabled. Error caught was: No module named 'triton' import network module: networks.lora [Dataset 0] caching latents. checking cache validity... 100%|███████████████████████████████████████████████████████████████████████████| 10/10 [00:00<00:00, 170.91it/s] caching latents... 100%|██████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00, 1.72it/s] create LoRA network. base dim (rank): 32, alpha: 32.0 neuron dropout: p=None, rank dropout: p=None, module dropout: p=None create LoRA for Text Encoder 1: create LoRA for Text Encoder 2: create LoRA for Text Encoder: 264 modules. create LoRA for U-Net: 722 modules. enable LoRA for text encoder enable LoRA for U-Net prepare optimizer, data loader etc. use AdamW optimizer | {} override steps. steps for 50 epochs is / 指定エポックまでのステップ数: 1800 running training / 学習開始 num train images repeats / 学習画像の数×繰り返し回数: 100 num reg images / 正則化画像の数: 0 num batches per epoch / 1epochのバッチ数: 36 num epochs / epoch数: 50 batch size per device / バッチサイズ: 3 gradient accumulation steps / 勾配を合計するステップ数 = 1 total optimization steps / 学習ステップ数: 1800 steps: 0%| | 0/1800 [00:00<?, ?it/s] epoch 1/50 steps: 0%|▏ | 5/1800 [02:53<17:19:23, 34.74s/it, avr_loss=0.071]05:12:47-824560 INFO Loading config... 05:12:47-995589 INFO runwayml/stable-diffusion-v1-5 model selected. 05:12:51-632229 INFO The running process has been terminated. 05:12:53-517059 INFO There is no running process to kill. 05:12:53-673588 INFO There is no running process to kill. 05:13:40-808375 INFO Start training LoRA Standard ... 05:13:40-809875 INFO Checking for duplicate image filenames in training data directory... Warning: Same filename 'Merial (1)' with different image extension found. This will cause training issues. Rename one of the file. Existing file: E:\GitHub\training\Merial\10_Merial\Merial (1).jpg Current file: E:\GitHub\training\Merial\10_Merial\Merial (1).webp 05:13:40-812875 INFO Valid image folder names found in: E:\GitHub\training\Merial 05:13:40-813874 INFO Folder 10_Merial: 10 images found 05:13:40-816875 INFO Folder 10_Merial: 100 steps 05:13:40-817875 INFO Total steps: 100 05:13:40-818875 INFO Train batch size: 2 05:13:40-819876 INFO Gradient accumulation steps: 1 05:13:40-820877 INFO Epoch: 15 05:13:40-822376 INFO Regulatization factor: 1 05:13:40-823376 INFO max_train_steps (100 / 2 / 1 15 * 1) = 750 05:13:40-824878 INFO stop_text_encoder_training = 0 05:13:40-826377 INFO lr_warmup_steps = 0 05:13:40-827377 INFO Saving training config to C:\Users\josep\Desktop\Lora\Merial13_20240108-051340.json... 05:13:40-828878 INFO accelerate launch --num_cpu_threads_per_process=2 "./train_network.py" --enable_bucket --min_bucket_reso=256 --max_bucket_reso=2048 --pretrained_model_name_or_path="runwayml/stable-diffusion-v1-5" --train_data_dir="E:\GitHub\training\Merial" --resolution="512,512" --output_dir="C:\Users\josep\Desktop\Lora" --logging_dir="C:\Users\josep\Desktop\Lora" --network_alpha="128" --save_model_as=safetensors --network_module=networks.lora --text_encoder_lr=5e-05 --unet_lr=0.00015 --network_dim=128 --output_name="Merial13" --lr_scheduler_num_cycles="15" --no_half_vae --learning_rate="0.00015" --lr_scheduler="cosine" --train_batch_size="2" --max_train_steps="750" --save_every_n_epochs="5" --mixed_precision="bf16" --save_precision="bf16" --caption_extension=".txt" --cache_latents --optimizer_type="AdamW8bit" --max_grad_norm="1" --max_data_loader_n_workers="0" --max_token_length=225 --clip_skip=2 --bucket_reso_steps=64 --xformers --bucket_no_upscale --noise_offset=0.0 prepare tokenizer update token length: 225 Using DreamBooth method. prepare images. found directory E:\GitHub\training\Merial\10_Merial contains 10 image files 100 train images with repeating. 0 reg images. no regularization images / 正則化画像が見つかりませんでした [Dataset 0] batch_size: 2 resolution: (512, 512) enable_bucket: True min_bucket_reso: 256 max_bucket_reso: 2048 bucket_reso_steps: 64 bucket_no_upscale: True

[Subset 0 of Dataset 0] image_dir: "E:\GitHub\training\Merial\10_Merial" image_count: 10 num_repeats: 10 shuffle_caption: False keep_tokens: 0 keep_tokens_separator: caption_dropout_rate: 0.0 caption_dropout_every_n_epoches: 0 caption_tag_dropout_rate: 0.0 caption_prefix: None caption_suffix: None color_aug: False flip_aug: False face_crop_aug_range: None random_crop: False token_warmup_min: 1, token_warmup_step: 0, is_reg: False class_tokens: Merial caption_extension: .txt

[Dataset 0] loading image sizes. 100%|███████████████████████████████████████████████████████████████████████████| 10/10 [00:00<00:00, 139.83it/s] make buckets min_bucket_reso and max_bucket_reso are ignored if bucket_no_upscale is set, because bucket reso is defined by image size automatically / bucket_no_upscaleが指定された場合は、bucketの解像度は画像サイズから自動計算されるため、min_bucket_resoとmax_bucket_resoは無視されます number of images (including repeats) / 各bucketの画像枚数（繰り返し回数を含む） bucket 0: resolution (128, 256), count: 20 bucket 1: resolution (192, 384), count: 10 bucket 2: resolution (256, 384), count: 10 bucket 3: resolution (256, 576), count: 20 bucket 4: resolution (320, 512), count: 10 bucket 5: resolution (384, 512), count: 10 bucket 6: resolution (384, 576), count: 10 bucket 7: resolution (512, 512), count: 10 mean ar error (without repeats): 0.06391376824093384 preparing accelerator loading model for process 0/1 load Diffusers pretrained models: runwayml/stable-diffusion-v1-5 Loading pipeline components...: 100%|██████████████████████████████████████████████| 5/5 [00:00<00:00, 7.01it/s] You have disabled the safety checker for <class 'diffusers.pipelines.stable_diffusion.pipeline_stable_diffusion.StableDiffusionPipeline'> by passing safety_checker=None. Ensure that you abide to the conditions of the Stable Diffusion license and do not expose unfiltered results in services or applications open to the public. Both the diffusers team and Hugging Face strongly recommend to keep the safety filter enabled in all public facing circumstances, disabling it only for use-cases that involve analyzing network behavior or auditing its results. For more information, please have a look at https://github.com/huggingface/diffusers/pull/254 . UNet2DConditionModel: 64, 8, 768, False, False U-Net converted to original U-Net Enable xformers for U-Net A matching Triton is not available, some optimizations will not be enabled. Error caught was: No module named 'triton' import network module: networks.lora [Dataset 0] caching latents. checking cache validity... 100%|████████████████████████████████████████████████████████████████████████████████████| 10/10 [00:00<?, ?it/s] caching latents... 100%|████████████████████████████████████████████████████████████████████████████| 10/10 [00:00<00:00, 13.69it/s] create LoRA network. base dim (rank): 128, alpha: 128.0 neuron dropout: p=None, rank dropout: p=None, module dropout: p=None create LoRA for Text Encoder: create LoRA for Text Encoder: 72 modules. create LoRA for U-Net: 192 modules. enable LoRA for text encoder enable LoRA for U-Net prepare optimizer, data loader etc. Traceback (most recent call last): File "E:\GitHub\kohya_ss\library\train_util.py", line 3480, in get_optimizer import bitsandbytes as bnb File "E:\GitHub\kohya_ss\venv\lib\site-packages\bitsandbytes__init.py", line 6, in from . import cuda_setup, utils, research File "E:\GitHub\kohya_ss\venv\lib\site-packages\bitsandbytes\research__init__.py", line 1, in from . import nn File "E:\GitHub\kohya_ss\venv\lib\site-packages\bitsandbytes\research\nn\init.py", line 1, in from .modules import LinearFP8Mixed, LinearFP8Global File "E:\GitHub\kohya_ss\venv\lib\site-packages\bitsandbytes\research\nn\modules.py", line 8, in from bitsandbytes.optim import GlobalOptimManager File "E:\GitHub\kohya_ss\venv\lib\site-packages\bitsandbytes\optim\init__.py", line 6, in from bitsandbytes.cextension import COMPILED_WITH_CUDA File "E:\GitHub\kohya_ss\venv\lib\site-packages\bitsandbytes\cextension.py", line 5, in from .cuda_setup.main import evaluate_cuda_setup File "E:\GitHub\kohya_ss\venv\lib\site-packages\bitsandbytes\cuda_setup\main.py", line 21, in from .paths import determine_cuda_runtime_lib_path ModuleNotFoundError: No module named 'bitsandbytes.cuda_setup.paths'

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "E:\GitHub\kohya_ss\train_network.py", line 996, in trainer.train(args) File "E:\GitHub\kohya_ss\train_network.py", line 348, in train optimizer_name, optimizer_args, optimizer = train_util.get_optimizer(args, trainable_params) File "E:\GitHub\kohya_ss\library\train_util.py", line 3482, in get_optimizer raise ImportError("No bitsandbytes / bitsandbytesがインストールされていないようです") ImportError: No bitsandbytes / bitsandbytesがインストールされていないようです Traceback (most recent call last): File "C:\Users\josep\AppData\Local\Programs\Python\Python310\lib\runpy.py", line 196, in _run_module_as_main return _run_code(code, main_globals, None, File "C:\Users\josep\AppData\Local\Programs\Python\Python310\lib\runpy.py", line 86, in _run_code exec(code, run_globals) File "E:\GitHub\kohya_ss\venv\Scripts\accelerate.exe__main__.py", line 7, in File "E:\GitHub\kohya_ss\venv\lib\site-packages\accelerate\commands\accelerate_cli.py", line 47, in main args.func(args) File "E:\GitHub\kohya_ss\venv\lib\site-packages\accelerate\commands\launch.py", line 986, in launch_command simple_launcher(args) File "E:\GitHub\kohya_ss\venv\lib\site-packages\accelerate\commands\launch.py", line 628, in simple_launcher raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd) subprocess.CalledProcessError: Command '['E:\GitHub\kohya_ss\venv\Scripts\python.exe', './train_network.py', '--enable_bucket', '--min_bucket_reso=256', '--max_bucket_reso=2048', '--pretrained_model_name_or_path=runwayml/stable-diffusion-v1-5', '--train_data_dir=E:\GitHub\training\Merial', '--resolution=512,512', '--output_dir=C:\Users\josep\Desktop\Lora', '--logging_dir=C:\Users\josep\Desktop\Lora', '--network_alpha=128', '--save_model_as=safetensors', '--network_module=networks.lora', '--text_encoder_lr=5e-05', '--unet_lr=0.00015', '--network_dim=128', '--output_name=Merial13', '--lr_scheduler_num_cycles=15', '--no_half_vae', '--learning_rate=0.00015', '--lr_scheduler=cosine', '--train_batch_size=2', '--max_train_steps=750', '--save_every_n_epochs=5', '--mixed_precision=bf16', '--save_precision=bf16', '--caption_extension=.txt', '--cache_latents', '--optimizer_type=AdamW8bit', '--max_grad_norm=1', '--max_data_loader_n_workers=0', '--max_token_length=225', '--clip_skip=2', '--bucket_reso_steps=64', '--xformers', '--bucket_no_upscale', '--noise_offset=0.0']' returned non-zero exit status 1. 05:15:35-218988 INFO Start training LoRA Standard ... 05:15:35-219988 INFO Checking for duplicate image filenames in training data directory... Warning: Same filename 'Merial (1)' with different image extension found. This will cause training issues. Rename one of the file. Existing file: E:\GitHub\training\Merial\10_Merial\Merial (1).jpg Current file: E:\GitHub\training\Merial\10_Merial\Merial (1).webp 05:15:35-224490 INFO Valid image folder names found in: E:\GitHub\training\Merial 05:15:35-225989 INFO Folder 10_Merial: 10 images found 05:15:35-226990 INFO Folder 10_Merial: 100 steps 05:15:35-227990 INFO Total steps: 100 05:15:35-228491 INFO Train batch size: 2 05:15:35-229990 INFO Gradient accumulation steps: 1 05:15:35-230989 INFO Epoch: 15 05:15:35-231990 INFO Regulatization factor: 1 05:15:35-234490 INFO max_train_steps (100 / 2 / 1 15 1) = 750 05:15:35-235491 INFO stop_text_encoder_training = 0 05:15:35-236491 INFO lr_warmup_steps = 0 05:15:35-237991 INFO Saving training config to C:\Users\josep\Desktop\Lora\Merial13_20240108-051535.json... 05:15:35-239492 INFO accelerate launch --num_cpu_threads_per_process=2 "./train_network.py" --enable_bucket --min_bucket_reso=256 --max_bucket_reso=2048 --pretrained_model_name_or_path="E:/GitHub/stable-diffusion-webui/models/Stable-diffusion /sd_xl_base_1.0.safetensors" --train_data_dir="E:\GitHub\training\Merial" --resolution="512,512" --output_dir="C:\Users\josep\Desktop\Lora" --logging_dir="C:\Users\josep\Desktop\Lora" --network_alpha="128" --save_model_as=safetensors --network_module=networks.lora --text_encoder_lr=5e-05 --unet_lr=0.00015 --network_dim=128 --output_name="Merial13" --lr_scheduler_num_cycles="15" --no_half_vae --learning_rate="0.00015" --lr_scheduler="cosine" --train_batch_size="2" --max_train_steps="750" --save_every_n_epochs="5" --mixed_precision="bf16" --save_precision="bf16" --caption_extension=".txt" --cache_latents --optimizer_type="AdamW8bit" --max_grad_norm="1" --max_data_loader_n_workers="0" --max_token_length=225 --clip_skip=2 --bucket_reso_steps=64 --xformers --bucket_no_upscale --noise_offset=0.0 prepare tokenizer update token length: 225 Using DreamBooth method. prepare images. found directory E:\GitHub\training\Merial\10_Merial contains 10 image files 100 train images with repeating. 0 reg images. no regularization images / 正則化画像が見つかりませんでした [Dataset 0] batch_size: 2 resolution: (512, 512) enable_bucket: True min_bucket_reso: 256 max_bucket_reso: 2048 bucket_reso_steps: 64 bucket_no_upscale: True

[Subset 0 of Dataset 0] image_dir: "E:\GitHub\training\Merial\10_Merial" image_count: 10 num_repeats: 10 shuffle_caption: False keep_tokens: 0 keep_tokens_separator: caption_dropout_rate: 0.0 caption_dropout_every_n_epoches: 0 caption_tag_dropout_rate: 0.0 caption_prefix: None caption_suffix: None color_aug: False flip_aug: False face_crop_aug_range: None random_crop: False token_warmup_min: 1, token_warmup_step: 0, is_reg: False class_tokens: Merial caption_extension: .txt

[Dataset 0] loading image sizes. 100%|███████████████████████████████████████████████████████████████████████████| 10/10 [00:00<00:00, 150.35it/s] make buckets min_bucket_reso and max_bucket_reso are ignored if bucket_no_upscale is set, because bucket reso is defined by image size automatically / bucket_no_upscaleが指定された場合は、bucketの解像度は画像サイズから自動計算されるため、min_bucket_resoとmax_bucket_resoは無視されます number of images (including repeats) / 各bucketの画像枚数（繰り返し回数を含む） bucket 0: resolution (128, 256), count: 20 bucket 1: resolution (192, 384), count: 10 bucket 2: resolution (256, 384), count: 10 bucket 3: resolution (256, 576), count: 20 bucket 4: resolution (320, 512), count: 10 bucket 5: resolution (384, 512), count: 10 bucket 6: resolution (384, 576), count: 10 bucket 7: resolution (512, 512), count: 10 mean ar error (without repeats): 0.06391376824093384 preparing accelerator loading model for process 0/1 load StableDiffusion checkpoint: E:/GitHub/stable-diffusion-webui/models/Stable-diffusion/sd_xl_base_1.0.safetensors UNet2DConditionModel: 64, 8, 768, False, False Traceback (most recent call last): File "E:\GitHub\kohya_ss\train_network.py", line 996, in trainer.train(args) File "E:\GitHub\kohya_ss\train_network.py", line 234, in train model_version, text_encoder, vae, unet = self.load_target_model(args, weight_dtype, accelerator) File "E:\GitHub\kohya_ss\train_network.py", line 103, in load_target_model textencoder, vae, unet, = train_util.load_target_model(args, weight_dtype, accelerator) File "E:\GitHub\kohya_ss\library\train_util.py", line 3960, in load_target_model text_encoder, vae, unet, load_stable_diffusion_format = _load_target_model( File "E:\GitHub\kohya_ss\library\train_util.py", line 3914, in _load_target_model text_encoder, vae, unet = model_util.load_models_from_stable_diffusion_checkpoint( File "E:\GitHub\kohya_ss\library\model_util.py", line 1007, in load_models_from_stable_diffusion_checkpoint info = unet.load_state_dict(converted_unet_checkpoint) File "E:\GitHub\kohya_ss\venv\lib\site-packages\torch\nn\modules\module.py", line 2041, in load_state_dict raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format( RuntimeError: Error(s) in loading state_dict for UNet2DConditionModel: Missing key(s) in state_dict: "down_blocks.0.attentions.0.norm.weight", "down_blocks.0.attentions.0.norm.bias", "down_blocks.0.attentions.0.proj_in.weight", "down_blocks.0.attentions.0.proj_in.bias", "down_blocks.0.attentions.0.transformer_blocks.0.attn1.to_q.weight", "down_blocks.0.attentions.0.transformer_blocks.0.attn1.to_k.weight", "down_blocks.0.attentions.0.transformer_blocks.0.attn1.to_v.weight", "down_blocks.0.attentions.0.transformer_blocks.0.attn1.to_out.0.weight", "down_blocks.0.attentions.0.transformer_blocks.0.attn1.to_out.0.bias", "down_blocks.0.attentions.0.transformer_blocks.0.ff.net.0.proj.weight", "down_blocks.0.attentions.0.transformer_blocks.0.ff.net.0.proj.bias", "down_blocks.0.attentions.0.transformer_blocks.0.ff.net.2.weight", "down_blocks.0.attentions.0.transformer_blocks.0.ff.net.2.bias", "down_blocks.0.attentions.0.transformer_blocks.0.attn2.to_q.weight", "down_blocks.0.attentions.0.transformer_blocks.0.attn2.to_k.weight", "down_blocks.0.attentions.0.transformer_blocks.0.attn2.to_v.weight", "down_blocks.0.attentions.0.transformer_blocks.0.attn2.to_out.0.weight", "down_blocks.0.attentions.0.transformer_blocks.0.attn2.to_out.0.bias", "down_blocks.0.attentions.0.transformer_blocks.0.norm1.weight", "down_blocks.0.attentions.0.transformer_blocks.0.norm1.bias", "down_blocks.0.attentions.0.transformer_blocks.0.norm2.weight", "down_blocks.0.attentions.0.transformer_blocks.0.norm2.bias", "down_blocks.0.attentions.0.transformer_blocks.0.norm3.weight", "down_blocks.0.attentions.0.transformer_blocks.0.norm3.bias", "down_blocks.0.attentions.0.proj_out.weight", "down_blocks.0.attentions.0.proj_out.bias", "down_blocks.0.attentions.1.norm.weight", "down_blocks.0.attentions.1.norm.bias", "down_blocks.0.attentions.1.proj_in.weight", "down_blocks.0.attentions.1.proj_in.bias", "down_blocks.0.attentions.1.transformer_blocks.0.attn1.to_q.weight", "down_blocks.0.attentions.1.transformer_blocks.0.attn1.to_k.weight", "down_blocks.0.attentions.1.transformer_blocks.0.attn1.to_v.weight", "down_blocks.0.attentions.1.transformer_blocks.0.attn1.to_out.0.weight", "down_blocks.0.attentions.1.transformer_blocks.0.attn1.to_out.0.bias", "down_blocks.0.attentions.1.transformer_blocks.0.ff.net.0.proj.weight", "down_blocks.0.attentions.1.transformer_blocks.0.ff.net.0.proj.bias", "down_blocks.0.attentions.1.transformer_blocks.0.ff.net.2.weight", "down_blocks.0.attentions.1.transformer_blocks.0.ff.net.2.bias", "down_blocks.0.attentions.1.transformer_blocks.0.attn2.to_q.weight", "down_blocks.0.attentions.1.transformer_blocks.0.attn2.to_k.weight", "down_blocks.0.attentions.1.transformer_blocks.0.attn2.to_v.weight", "down_blocks.0.attentions.1.transformer_blocks.0.attn2.to_out.0.weight", "down_blocks.0.attentions.1.transformer_blocks.0.attn2.to_out.0.bias", "down_blocks.0.attentions.1.transformer_blocks.0.norm1.weight", "down_blocks.0.attentions.1.transformer_blocks.0.norm1.bias", "down_blocks.0.attentions.1.transformer_blocks.0.norm2.weight", "down_blocks.0.attentions.1.transformer_blocks.0.norm2.bias", "down_blocks.0.attentions.1.transformer_blocks.0.norm3.weight", "down_blocks.0.attentions.1.transformer_blocks.0.norm3.bias", "down_blocks.0.attentions.1.proj_out.weight", "down_blocks.0.attentions.1.proj_out.bias", "down_blocks.2.downsamplers.0.conv.weight", "down_blocks.2.downsamplers.0.conv.bias", "down_blocks.3.resnets.0.norm1.weight", "down_blocks.3.resnets.0.norm1.bias", "down_blocks.3.resnets.0.conv1.weight", "down_blocks.3.resnets.0.conv1.bias", "down_blocks.3.resnets.0.time_emb_proj.weight", "down_blocks.3.resnets.0.time_emb_proj.bias", "down_blocks.3.resnets.0.norm2.weight", "down_blocks.3.resnets.0.norm2.bias", "down_blocks.3.resnets.0.conv2.weight", "down_blocks.3.resnets.0.conv2.bias", "down_blocks.3.resnets.1.norm1.weight", "down_blocks.3.resnets.1.norm1.bias", "down_blocks.3.resnets.1.conv1.weight", "down_blocks.3.resnets.1.conv1.bias", "down_blocks.3.resnets.1.time_emb_proj.weight", "down_blocks.3.resnets.1.time_emb_proj.bias", "down_blocks.3.resnets.1.norm2.weight", "down_blocks.3.resnets.1.norm2.bias", "down_blocks.3.resnets.1.conv2.weight", "down_blocks.3.resnets.1.conv2.bias", "up_blocks.2.attentions.0.norm.weight", "up_blocks.2.attentions.0.norm.bias", "up_blocks.2.attentions.0.proj_in.weight", "up_blocks.2.attentions.0.proj_in.bias", "up_blocks.2.attentions.0.transformer_blocks.0.attn1.to_q.weight", "up_blocks.2.attentions.0.transformer_blocks.0.attn1.to_k.weight", "up_blocks.2.attentions.0.transformer_blocks.0.attn1.to_v.weight", "up_blocks.2.attentions.0.transformer_blocks.0.attn1.to_out.0.weight", "up_blocks.2.attentions.0.transformer_blocks.0.attn1.to_out.0.bias", "up_blocks.2.attentions.0.transformer_blocks.0.ff.net.0.proj.weight", "up_blocks.2.attentions.0.transformer_blocks.0.ff.net.0.proj.bias", "up_blocks.2.attentions.0.transformer_blocks.0.ff.net.2.weight", "up_blocks.2.attentions.0.transformer_blocks.0.ff.net.2.bias", "up_blocks.2.attentions.0.transformer_blocks.0.attn2.to_q.weight", "up_blocks.2.attentions.0.transformer_blocks.0.attn2.to_k.weight", "up_blocks.2.attentions.0.transformer_blocks.0.attn2.to_v.weight", "up_blocks.2.attentions.0.transformer_blocks.0.attn2.to_out.0.weight", "up_blocks.2.attentions.0.transformer_blocks.0.attn2.to_out.0.bias", "up_blocks.2.attentions.0.transformer_blocks.0.norm1.weight", "up_blocks.2.attentions.0.transformer_blocks.0.norm1.bias", "up_blocks.2.attentions.0.transformer_blocks.0.norm2.weight", "up_blocks.2.attentions.0.transformer_blocks.0.norm2.bias", "up_blocks.2.attentions.0.transformer_blocks.0.norm3.weight", "up_blocks.2.attentions.0.transformer_blocks.0.norm3.bias", "up_blocks.2.attentions.0.proj_out.weight", "up_blocks.2.attentions.0.proj_out.bias", "up_blocks.2.attentions.1.norm.weight", "up_blocks.2.attentions.1.norm.bias", "up_blocks.2.attentions.1.proj_in.weight", "up_blocks.2.attentions.1.proj_in.bias", "up_blocks.2.attentions.1.transformer_blocks.0.attn1.to_q.weight", "up_blocks.2.attentions.1.transformer_blocks.0.attn1.to_k.weight", "up_blocks.2.attentions.1.transformer_blocks.0.attn1.to_v.weight", "up_blocks.2.attentions.1.transformer_blocks.0.attn1.to_out.0.weight", "up_blocks.2.attentions.1.transformer_blocks.0.attn1.to_out.0.bias", "up_blocks.2.attentions.1.transformer_blocks.0.ff.net.0.proj.weight", "up_blocks.2.attentions.1.transformer_blocks.0.ff.net.0.proj.bias", "up_blocks.2.attentions.1.transformer_blocks.0.ff.net.2.weight", "up_blocks.2.attentions.1.transformer_blocks.0.ff.net.2.bias", "up_blocks.2.attentions.1.transformer_blocks.0.attn2.to_q.weight", "up_blocks.2.attentions.1.transformer_blocks.0.attn2.to_k.weight", "up_blocks.2.attentions.1.transformer_blocks.0.attn2.to_v.weight", "up_blocks.2.attentions.1.transformer_blocks.0.attn2.to_out.0.weight", "up_blocks.2.attentions.1.transformer_blocks.0.attn2.to_out.0.bias", "up_blocks.2.attentions.1.transformer_blocks.0.norm1.weight", "up_blocks.2.attentions.1.transformer_blocks.0.norm1.bias", "up_blocks.2.attentions.1.transformer_blocks.0.norm2.weight", "up_blocks.2.attentions.1.transformer_blocks.0.norm2.bias", "up_blocks.2.attentions.1.transformer_blocks.0.norm3.weight", "up_blocks.2.attentions.1.transformer_blocks.0.norm3.bias", "up_blocks.2.attentions.1.proj_out.weight", "up_blocks.2.attentions.1.proj_out.bias", "up_blocks.2.attentions.2.norm.weight", "up_blocks.2.attentions.2.norm.bias", "up_blocks.2.attentions.2.proj_in.weight", "up_blocks.2.attentions.2.proj_in.bias", "up_blocks.2.attentions.2.transformer_blocks.0.attn1.to_q.weight", "up_blocks.2.attentions.2.transformer_blocks.0.attn1.to_k.weight", "up_blocks.2.attentions.2.transformer_blocks.0.attn1.to_v.weight", "up_blocks.2.attentions.2.transformer_blocks.0.attn1.to_out.0.weight", "up_blocks.2.attentions.2.transformer_blocks.0.attn1.to_out.0.bias", "up_blocks.2.attentions.2.transformer_blocks.0.ff.net.0.proj.weight", "up_blocks.2.attentions.2.transformer_blocks.0.ff.net.0.proj.bias", "up_blocks.2.attentions.2.transformer_blocks.0.ff.net.2.weight", "up_blocks.2.attentions.2.transformer_blocks.0.ff.net.2.bias", "up_blocks.2.attentions.2.transformer_blocks.0.attn2.to_q.weight", "up_blocks.2.attentions.2.transformer_blocks.0.attn2.to_k.weight", "up_blocks.2.attentions.2.transformer_blocks.0.attn2.to_v.weight", "up_blocks.2.attentions.2.transformer_blocks.0.attn2.to_out.0.weight", "up_blocks.2.attentions.2.transformer_blocks.0.attn2.to_out.0.bias", "up_blocks.2.attentions.2.transformer_blocks.0.norm1.weight", "up_blocks.2.attentions.2.transformer_blocks.0.norm1.bias", "up_blocks.2.attentions.2.transformer_blocks.0.norm2.weight", "up_blocks.2.attentions.2.transformer_blocks.0.norm2.bias", "up_blocks.2.attentions.2.transformer_blocks.0.norm3.weight", "up_blocks.2.attentions.2.transformer_blocks.0.norm3.bias", "up_blocks.2.attentions.2.proj_out.weight", "up_blocks.2.attentions.2.proj_out.bias", "up_blocks.2.upsamplers.0.conv.weight", "up_blocks.2.upsamplers.0.conv.bias", "up_blocks.3.attentions.0.norm.weight", "up_blocks.3.attentions.0.norm.bias", "up_blocks.3.attentions.0.proj_in.weight", "up_blocks.3.attentions.0.proj_in.bias", "up_blocks.3.attentions.0.transformer_blocks.0.attn1.to_q.weight", "up_blocks.3.attentions.0.transformer_blocks.0.attn1.to_k.weight", "up_blocks.3.attentions.0.transformer_blocks.0.attn1.to_v.weight", "up_blocks.3.attentions.0.transformer_blocks.0.attn1.to_out.0.weight", "up_blocks.3.attentions.0.transformer_blocks.0.attn1.to_out.0.bias", "up_blocks.3.attentions.0.transformer_blocks.0.ff.net.0.proj.weight", "up_blocks.3.attentions.0.transformer_blocks.0.ff.net.0.proj.bias", "up_blocks.3.attentions.0.transformer_blocks.0.ff.net.2.weight", "up_blocks.3.attentions.0.transformer_blocks.0.ff.net.2.bias", "up_blocks.3.attentions.0.transformer_blocks.0.attn2.to_q.weight", "up_blocks.3.attentions.0.transformer_blocks.0.attn2.to_k.weight", "up_blocks.3.attentions.0.transformer_blocks.0.attn2.to_v.weight", "up_blocks.3.attentions.0.transformer_blocks.0.attn2.to_out.0.weight", "up_blocks.3.attentions.0.transformer_blocks.0.attn2.to_out.0.bias", "up_blocks.3.attentions.0.transformer_blocks.0.norm1.weight", "up_blocks.3.attentions.0.transformer_blocks.0.norm1.bias", "up_blocks.3.attentions.0.transformer_blocks.0.norm2.weight", "up_blocks.3.attentions.0.transformer_blocks.0.norm2.bias", "up_blocks.3.attentions.0.transformer_blocks.0.norm3.weight", "up_blocks.3.attentions.0.transformer_blocks.0.norm3.bias", "up_blocks.3.attentions.0.proj_out.weight", "up_blocks.3.attentions.0.proj_out.bias", "up_blocks.3.attentions.1.norm.weight", "up_blocks.3.attentions.1.norm.bias", "up_blocks.3.attentions.1.proj_in.weight", "up_blocks.3.attentions.1.proj_in.bias", "up_blocks.3.attentions.1.transformer_blocks.0.attn1.to_q.weight", "up_blocks.3.attentions.1.transformer_blocks.0.attn1.to_k.weight", "up_blocks.3.attentions.1.transformer_blocks.0.attn1.to_v.weight", "up_blocks.3.attentions.1.transformer_blocks.0.attn1.to_out.0.weight", "up_blocks.3.attentions.1.transformer_blocks.0.attn1.to_out.0.bias", "up_blocks.3.attentions.1.transformer_blocks.0.ff.net.0.proj.weight", "up_blocks.3.attentions.1.transformer_blocks.0.ff.net.0.proj.bias", "up_blocks.3.attentions.1.transformer_blocks.0.ff.net.2.weight", "up_blocks.3.attentions.1.transformer_blocks.0.ff.net.2.bias", "up_blocks.3.attentions.1.transformer_blocks.0.attn2.to_q.weight", "up_blocks.3.attentions.1.transformer_blocks.0.attn2.to_k.weight", "up_blocks.3.attentions.1.transformer_blocks.0.attn2.to_v.weight", "up_blocks.3.attentions.1.transformer_blocks.0.attn2.to_out.0.weight", "up_blocks.3.attentions.1.transformer_blocks.0.attn2.to_out.0.bias", "up_blocks.3.attentions.1.transformer_blocks.0.norm1.weight", "up_blocks.3.attentions.1.transformer_blocks.0.norm1.bias", "up_blocks.3.attentions.1.transformer_blocks.0.norm2.weight", "up_blocks.3.attentions.1.transformer_blocks.0.norm2.bias", "up_blocks.3.attentions.1.transformer_blocks.0.norm3.weight", "up_blocks.3.attentions.1.transformer_blocks.0.norm3.bias", "up_blocks.3.attentions.1.proj_out.weight", "up_blocks.3.attentions.1.proj_out.bias", "up_blocks.3.attentions.2.norm.weight", "up_blocks.3.attentions.2.norm.bias", "up_blocks.3.attentions.2.proj_in.weight", "up_blocks.3.attentions.2.proj_in.bias", "up_blocks.3.attentions.2.transformer_blocks.0.attn1.to_q.weight", "up_blocks.3.attentions.2.transformer_blocks.0.attn1.to_k.weight", "up_blocks.3.attentions.2.transformer_blocks.0.attn1.to_v.weight", "up_blocks.3.attentions.2.transformer_blocks.0.attn1.to_out.0.weight", "up_blocks.3.attentions.2.transformer_blocks.0.attn1.to_out.0.bias", "up_blocks.3.attentions.2.transformer_blocks.0.ff.net.0.proj.weight", "up_blocks.3.attentions.2.transformer_blocks.0.ff.net.0.proj.bias", "up_blocks.3.attentions.2.transformer_blocks.0.ff.net.2.weight", "up_blocks.3.attentions.2.transformer_blocks.0.ff.net.2.bias", "up_blocks.3.attentions.2.transformer_blocks.0.attn2.to_q.weight", "up_blocks.3.attentions.2.transformer_blocks.0.attn2.to_k.weight", "up_blocks.3.attentions.2.transformer_blocks.0.attn2.to_v.weight", "up_blocks.3.attentions.2.transformer_blocks.0.attn2.to_out.0.weight", "up_blocks.3.attentions.2.transformer_blocks.0.attn2.to_out.0.bias", "up_blocks.3.attentions.2.transformer_blocks.0.norm1.weight", "up_blocks.3.attentions.2.transformer_blocks.0.norm1.bias", "up_blocks.3.attentions.2.transformer_blocks.0.norm2.weight", "up_blocks.3.attentions.2.transformer_blocks.0.norm2.bias", "up_blocks.3.attentions.2.transformer_blocks.0.norm3.weight", "up_blocks.3.attentions.2.transformer_blocks.0.norm3.bias", "up_blocks.3.attentions.2.proj_out.weight", "up_blocks.3.attentions.2.proj_out.bias", "up_blocks.3.resnets.0.norm1.weight", "up_blocks.3.resnets.0.norm1.bias", "up_blocks.3.resnets.0.conv1.weight", "up_blocks.3.resnets.0.conv1.bias", "up_blocks.3.resnets.0.time_emb_proj.weight", "up_blocks.3.resnets.0.time_emb_proj.bias", "up_blocks.3.resnets.0.norm2.weight", "up_blocks.3.resnets.0.norm2.bias", "up_blocks.3.resnets.0.conv2.weight", "up_blocks.3.resnets.0.conv2.bias", "up_blocks.3.resnets.0.conv_shortcut.weight", "up_blocks.3.resnets.0.conv_shortcut.bias", "up_blocks.3.resnets.1.norm1.weight", "up_blocks.3.resnets.1.norm1.bias", "up_blocks.3.resnets.1.conv1.weight", "up_blocks.3.resnets.1.conv1.bias", "up_blocks.3.resnets.1.time_emb_proj.weight", "up_blocks.3.resnets.1.time_emb_proj.bias", "up_blocks.3.resnets.1.norm2.weight", "up_blocks.3.resnets.1.norm2.bias", "up_blocks.3.resnets.1.conv2.weight", "up_blocks.3.resnets.1.conv2.bias", "up_blocks.3.resnets.1.conv_shortcut.weight", "up_blocks.3.resnets.1.conv_shortcut.bias", "up_blocks.3.resnets.2.norm1.weight", "up_blocks.3.resnets.2.norm1.bias", "up_blocks.3.resnets.2.conv1.weight", "up_blocks.3.resnets.2.conv1.bias", "up_blocks.3.resnets.2.time_emb_proj.weight", "up_blocks.3.resnets.2.time_emb_proj.bias", "up_blocks.3.resnets.2.norm2.weight", "up_blocks.3.resnets.2.norm2.bias", "up_blocks.3.resnets.2.conv2.weight", "up_blocks.3.resnets.2.conv2.bias", "up_blocks.3.resnets.2.conv_shortcut.weight", "up_blocks.3.resnets.2.conv_shortcut.bias". Unexpected key(s) in state_dict: "down_blocks.1.attentions.0.transformer_blocks.1.attn1.to_k.weight", "down_blocks.1.attentions.0.transformer_blocks.1.attn1.to_out.0.bias", "down_blocks.1.attentions.0.transformer_blocks.1.attn1.to_out.0.weight", "down_blocks.1.attentions.0.transformer_blocks.1.attn1.to_q.weight", "down_blocks.1.attentions.0.transformer_blocks.1.attn1.to_v.weight", "down_blocks.1.attentions.0.transformer_blocks.1.attn2.to_k.weight", "down_blocks.1.attentions.0.transformer_blocks.1.attn2.to_out.0.bias", "down_blocks.1.attentions.0.transformer_blocks.1.attn2.to_out.0.weight", "down_blocks.1.attentions.0.transformer_blocks.1.attn2.to_q.weight", "down_blocks.1.attentions.0.transformer_blocks.1.attn2.to_v.weight", "down_blocks.1.attentions.0.transformer_blocks.1.ff.net.0.proj.bias", "down_blocks.1.attentions.0.transformer_blocks.1.ff.net.0.proj.weight", "down_blocks.1.attentions.0.transformer_blocks.1.ff.net.2.bias", "down_blocks.1.attentions.0.transformer_blocks.1.ff.net.2.weight", "down_blocks.1.attentions.0.transformer_blocks.1.norm1.bias", "down_blocks.1.attentions.0.transformer_blocks.1.norm1.weight", "down_blocks.1.attentions.0.transformer_blocks.1.norm2.bias", "down_blocks.1.attentions.0.transformer_blocks.1.norm2.weight", "down_blocks.1.attentions.0.transformer_blocks.1.norm3.bias", "down_blocks.1.attentions.0.transformer_blocks.1.norm3.weight", "down_blocks.1.attentions.1.transformer_blocks.1.attn1.to_k.weight", "down_blocks.1.attentions.1.transformer_blocks.1.attn1.to_out.0.bias", "down_blocks.1.attentions.1.transformer_blocks.1.attn1.to_out.0.weight", "down_blocks.1.attentions.1.transformer_blocks.1.attn1.to_q.weight", "down_blocks.1.attentions.1.transformer_blocks.1.attn1.to_v.weight", "down_blocks.1.attentions.1.transformer_blocks.1.attn2.to_k.weight", "down_blocks.1.attentions.1.transformer_blocks.1.attn2.to_out.0.bias", "down_blocks.1.attentions.1.transformer_blocks.1.attn2.to_out.0.weight", "down_blocks.1.attentions.1.transformer_blocks.1.attn2.to_q.weight", "down_blocks.1.attentions.1.transformer_blocks.1.attn2.to_v.weight", "down_blocks.1.attentions.1.transformer_blocks.1.ff.net.0.proj.bias", "down_blocks.1.attentions.1.transformer_blocks.1.ff.net.0.proj.weight", "down_blocks.1.attentions.1.transformer_blocks.1.ff.net.2.bias", "down_blocks.1.attentions.1.transformer_blocks.1.ff.net.2.weight", "down_blocks.1.attentions.1.transformer_blocks.1.norm1.bias", "down_blocks.1.attentions.1.transformer_blocks.1.norm1.weight", "down_blocks.1.attentions.1.transformer_blocks.1.norm2.bias", "down_blocks.1.attentions.1.transformer_blocks.1.norm2.weight", "down_blocks.1.attentions.1.transformer_blocks.1.norm3.bias", "down_blocks.1.attentions.1.transformer_blocks.1.norm3.weight", "down_blocks.2.attentions.0.transformer_blocks.1.attn1.to_k.weight", "down_blocks.2.attentions.0.transformer_blocks.1.attn1.to_out.0.bias", "down_blocks.2.attentions.0.transformer_blocks.1.attn1.to_out.0.weight", "down_blocks.2.attentions.0.transformer_blocks.1.attn1.to_q.weight", "down_blocks.2.attentions.0.transformer_blocks.1.attn1.to_v.weight", "down_blocks.2.attentions.0.transformer_blocks.1.attn2.to_k.weight", "down_blocks.2.attentions.0.transformer_blocks.1.attn2.to_out.0.bias", "down_blocks.2.attentions.0.transformer_blocks.1.attn2.to_out.0.weight", "down_blocks.2.attentions.0.transformer_blocks.1.attn2.to_q.weight", "down_blocks.2.attentions.0.transformer_blocks.1.attn2.to_v.weight", "down_blocks.2.attentions.0.transformer_blocks.1.ff.net.0.proj.bias", "down_blocks.2.attentions.0.transformer_blocks.1.ff.net.0.proj.weight", "down_blocks.2.attentions.0.transformer_blocks.1.ff.net.2.bias", "down_blocks.2.attentions.0.transformer_blocks.1.ff.net.2.weight", "down_blocks.2.attentions.0.transformer_blocks.1.norm1.bias", "down_blocks.2.attentions.0.transformer_blocks.1.norm1.weight", "down_blocks.2.attentions.0.transformer_blocks.1.norm2.bias", "down_blocks.2.attentions.0.transformer_blocks.1.norm2.weight", "down_blocks.2.attentions.0.transformer_blocks.1.norm3.bias", "down_blocks.2.attentions.0.transformer_blocks.1.norm3.weight", "down_blocks.2.attentions.0.transformer_blocks.2.attn1.to_k.weight", "down_blocks.2.attentions.0.transformer_blocks.2.attn1.to_out.0.bias", "down_blocks.2.attentions.0.transformer_blocks.2.attn1.to_out.0.weight", "down_blocks.2.attentions.0.transformer_blocks.2.attn1.to_q.weight", "down_blocks.2.attentions.0.transformer_blocks.2.attn1.to_v.weight", "down_blocks.2.attentions.0.transformer_blocks.2.attn2.to_k.weight", "down_blocks.2.attentions.0.transformer_blocks.2.attn2.to_out.0.bias", "down_blocks.2.attentions.0.transformer_blocks.2.attn2.to_out.0.weight", "down_blocks.2.attentions.0.transformer_blocks.2.attn2.to_q.weight", "down_blocks.2.attentions.0.transformer_blocks.2.attn2.to_v.weight", "down_blocks.2.attentions.0.transformer_blocks.2.ff.net.0.proj.bias", "down_blocks.2.attentions.0.transformer_blocks.2.ff.net.0.proj.weight", "down_blocks.2.attentions.0.transformer_blocks.2.ff.net.2.bias", "down_blocks.2.attentions.0.transformer_blocks.2.ff.net.2.weight", "down_blocks.2.attentions.0.transformer_blocks.2.norm1.bias", "down_blocks.2.attentions.0.transformer_blocks.2.norm1.weight", "down_blocks.2.attentions.0.transformer_blocks.2.norm2.bias", "down_blocks.2.attentions.0.transformer_blocks.2.norm2.weight", "down_blocks.2.attentions.0.transformer_blocks.2.norm3.bias", "down_blocks.2.attentions.0.transformer_blocks.2.norm3.weight", "down_blocks.2.attentions.0.transformer_blocks.3.attn1.to_k.weight", "down_blocks.2.attentions.0.transformer_blocks.3.attn1.to_out.0.bias", "down_blocks.2.attentions.0.transformer_blocks.3.attn1.to_out.0.weight", "down_blocks.2.attentions.0.transformer_blocks.3.attn1.to_q.weight", "down_blocks.2.attentions.0.transformer_blocks.3.attn1.to_v.weight", "down_blocks.2.attentions.0.transformer_blocks.3.attn2.to_k.weight", "down_blocks.2.attentions.0.transformer_blocks.3.attn2.to_out.0.bias", "down_blocks.2.attentions.0.transformer_blocks.3.attn2.to_out.0.weight", "down_blocks.2.attentions.0.transformer_blocks.3.attn2.to_q.weight", "down_blocks.2.attentions.0.transformer_blocks.3.attn2.to_v.weight", "down_blocks.2.attentions.0.transformer_blocks.3.ff.net.0.proj.bias", "down_blocks.2.attentions.0.transformer_blocks.3.ff.net.0.proj.weight", "down_blocks.2.attentions.0.transformer_blocks.3.ff.net.2.bias", "down_blocks.2.attentions.0.transformer_blocks.3.ff.net.2.weight", "down_blocks.2.attentions.0.transformer_blocks.3.norm1.bias", "down_blocks.2.attentions.0.transformer_blocks.3.norm1.weight", "down_blocks.2.attentions.0.transformer_blocks.3.norm2.bias", "down_blocks.2.attentions.0.transformer_blocks.3.norm2.weight", "down_blocks.2.attentions.0.transformer_blocks.3.norm3.bias", "down_blocks.2.attentions.0.transformer_blocks.3.norm3.weight", "down_blocks.2.attentions.0.transformer_blocks.4.attn1.to_k.weight", "down_blocks.2.attentions.0.transformer_blocks.4.attn1.to_out.0.bias", "down_blocks.2.attentions.0.transformer_blocks.4.attn1.to_out.0.weight", "down_blocks.2.attentions.0.transformer_blocks.4.attn1.to_q.weight", "down_blocks.2.attentions.0.transformer_blocks.4.attn1.to_v.weight", "down_blocks.2.attentions.0.transformer_blocks.4.attn2.to_k.weight", "down_blocks.2.attentions.0.transformer_blocks.4.attn2.to_out.0.bias", "down_blocks.2.attentions.0.transformer_blocks.4.attn2.to_out.0.weight", "down_blocks.2.attentions.0.transformer_blocks.4.attn2.to_q.weight", "down_blocks.2.attentions.0.transformer_blocks.4.attn2.to_v.weight", "down_blocks.2.attentions.0.transformer_blocks.4.ff.net.0.proj.bias", "down_blocks.2.attentions.0.transformer_blocks.4.ff.net.0.proj.weight", "down_blocks.2.attentions.0.transformer_blocks.4.ff.net.2.bias", "down_blocks.2.attentions.0.transformer_blocks.4.ff.net.2.weight", "down_blocks.2.attentions.0.transformer_blocks.4.norm1.bias", "down_blocks.2.attentions.0.transformer_blocks.4.norm1.weight", "down_blocks.2.attentions.0.transformer_blocks.4.norm2.bias", "down_blocks.2.attentions.0.transformer_blocks.4.norm2.weight", "down_blocks.2.attentions.0.transformer_blocks.4.norm3.bias", "down_blocks.2.attentions.0.transformer_blocks.4.norm3.weight", "down_blocks.2.attentions.0.transformer_blocks.5.attn1.to_k.weight", "down_blocks.2.attentions.0.transformer_blocks.5.attn1.to_out.0.bias", "down_blocks.2.attentions.0.transformer_blocks.5.attn1.to_out.0.weight", "down_blocks.2.attentions.0.transformer_blocks.5.attn1.to_q.weight", "down_blocks.2.attentions.0.transformer_blocks.5.attn1.to_v.weight", "down_blocks.2.attentions.0.transformer_blocks.5.attn2.to_k.weight", "down_blocks.2.attentions.0.transformer_blocks.5.attn2.to_out.0.bias", "down_blocks.2.attentions.0.transformer_blocks.5.attn2.to_out.0.weight", "down_blocks.2.attentions.0.transformer_blocks.5.attn2.to_q.weight", "down_blocks.2.attentions.0.transformer_blocks.5.attn2.to_v.weight", "down_blocks.2.attentions.0.transformer_blocks.5.ff.net.0.proj.bias", "down_blocks.2.attentions.0.transformer_blocks.5.ff.net.0.proj.weight", "down_blocks.2.attentions.0.transformer_blocks.5.ff.net.2.bias", "down_blocks.2.attentions.0.transformer_blocks.5.ff.net.2.weight", "down_blocks.2.attentions.0.transformer_blocks.5.norm1.bias", "down_blocks.2.attentions.0.transformer_blocks.5.norm1.weight", "down_blocks.2.attentions.0.transformer_blocks.5.norm2.bias", "down_blocks.2.attentions.0.transformer_blocks.5.norm2.weight", "down_blocks.2.attentions.0.transformer_blocks.5.norm3.bias", "down_blocks.2.attentions.0.transformer_blocks.5.norm3.weight", "down_blocks.2.attentions.0.transformer_blocks.6.attn1.to_k.weight", "down_blocks.2.attentions.0.transformer_blocks.6.attn1.to_out.0.bias", "down_blocks.2.attentions.0.transformer_blocks.6.attn1.to_out.0.weight", "down_blocks.2.attentions.0.transformer_blocks.6.attn1.to_q.weight", "down_blocks.2.attentions.0.transformer_blocks.6.attn1.to_v.weight", "down_blocks.2.attentions.0.transformer_blocks.6.attn2.to_k.weight", "down_blocks.2.attentions.0.transformer_blocks.6.attn2.to_out.0.bias", "down_blocks.2.attentions.0.transformer_blocks.6.attn2.to_out.0.weight", "down_blocks.2.attentions.0.transformer_blocks.6.attn2.to_q.weight", "down_blocks.2.attentions.0.transformer_blocks.6.attn2.to_v.weight", "down_blocks.2.attentions.0.transformer_blocks.6.ff.net.0.proj.bias", "down_blocks.2.attentions.0.transformer_blocks.6.ff.net.0.proj.weight", "down_blocks.2.attentions.0.transformer_blocks.6.ff.net.2.bias", "down_blocks.2.attentions.0.transformer_blocks.6.ff.net.2.weight", "down_blocks.2.attentions.0.transformer_blocks.6.norm1.bias", "down_blocks.2.attentions.0.transformer_blocks.6.norm1.weight", "down_blocks.2.attentions.0.transformer_blocks.6.norm2.bias", "down_blocks.2.attentions.0.transformer_blocks.6.norm2.weight", "down_blocks.2.attentions.0.transformer_blocks.6.norm3.bias", "down_blocks.2.attentions.0.transformer_blocks.6.norm3.weight", "down_blocks.2.attentions.0.transformer_blocks.7.attn1.to_k.weight", "down_blocks.2.attentions.0.transformer_blocks.7.attn1.to_out.0.bias", "down_blocks.2.attentions.0.transformer_blocks.7.attn1.to_out.0.weight", "down_blocks.2.attentions.0.transformer_blocks.7.attn1.to_q.weight", "down_blocks.2.attentions.0.transformer_blocks.7.attn1.to_v.weight", "down_blocks.2.attentions.0.transformer_blocks.7.attn2.to_k.weight", "down_blocks.2.attentions.0.transformer_blocks.7.attn2.to_out.0.bias", "down_blocks.2.attentions.0.transformer_blocks.7.attn2.to_out.0.weight", "down_blocks.2.attentions.0.transformer_blocks.7.attn2.to_q.weight", "down_blocks.2.attentions.0.transformer_blocks.7.attn2.to_v.weight", "down_blocks.2.attentions.0.transformer_blocks.7.ff.net.0.proj.bias", "down_blocks.2.attentions.0.transformer_blocks.7.ff.net.0.proj.weight", "down_blocks.2.attentions.0.transformer_blocks.7.ff.net.2.bias", "down_blocks.2.attentions.0.transformer_blocks.7.ff.net.2.weight", "down_blocks.2.attentions.0.transformer_blocks.7.norm1.bias", "down_blocks.2.attentions.0.transformer_blocks.7.norm1.weight", "down_blocks.2.attentions.0.transformer_blocks.7.norm2.bias", "down_blocks.2.attentions.0.transformer_blocks.7.norm2.weight", "down_blocks.2.attentions.0.transformer_blocks.7.norm3.bias", "down_blocks.2.attentions.0.transformer_blocks.7.norm3.weight", "down_blocks.2.attentions.0.transformer_blocks.8.attn1.to_k.weight", "down_blocks.2.attentions.0.transformer_blocks.8.attn1.to_out.0.bias", "down_blocks.2.attentions.0.transformer_blocks.8.attn1.to_out.0.weight", "down_blocks.2.attentions.0.transformer_blocks.8.attn1.to_q.weight", "down_blocks.2.attentions.0.transformer_blocks.8.attn1.to_v.weight", "down_blocks.2.attentions.0.transformer_blocks.8.attn2.to_k.weight", "down_blocks.2.attentions.0.transformer_blocks.8.attn2.to_out.0.bias", "down_blocks.2.attentions.0.transformer_blocks.8.attn2.to_out.0.weight", "down_blocks.2.attentions.0.transformer_blocks.8.attn2.to_q.weight", "down_blocks.2.attentions.0.transformer_blocks.8.attn2.to_v.weight", "down_blocks.2.attentions.0.transformer_blocks.8.ff.net.0.proj.bias", "down_blocks.2.attentions.0.transformer_blocks.8.ff.net.0.proj.weight", "down_blocks.2.attentions.0.transformer_blocks.8.ff.net.2.bias", "down_blocks.2.attentions.0.transformer_blocks.8.ff.net.2.weight", "down_blocks.2.attentions.0.transformer_blocks.8.norm1.bias", "down_blocks.2.attentions.0.transformer_blocks.8.norm1.weight", "down_blocks.2.attentions.0.transformer_blocks.8.norm2.bias", "down_blocks.2.attentions.0.transformer_blocks.8.norm2.weight", "down_blocks.2.attentions.0.transformer_blocks.8.norm3.bias", "down_blocks.2.attentions.0.transformer_blocks.8.norm3.weight", "down_blocks.2.attentions.0.transformer_blocks.9.attn1.to_k.weight", "down_blocks.2.attentions.0.transformer_blocks.9.attn1.to_out.0.bias", "down_blocks.2.attentions.0.transformer_blocks.9.attn1.to_out.0.weight", "down_blocks.2.attentions.0.transformer_blocks.9.attn1.to_q.weight", "down_blocks.2.attentions.0.transformer_blocks.9.attn1.to_v.weight", "down_blocks.2.attentions.0.transformer_blocks.9.attn2.to_k.weight", "down_blocks.2.attentions.0.transformer_blocks.9.attn2.to_out.0.bias", "down_blocks.2.attentions.0.transformer_blocks.9.attn2.to_out.0.weight", "down_blocks.2.attentions.0.transformer_blocks.9.attn2.to_q.weight", "down_blocks.2.attentions.0.transformer_blocks.9.attn2.to_v.weight", "down_blocks.2.attentions.0.transformer_blocks.9.ff.net.0.proj.bias", "down_blocks.2.attentions.0.transformer_blocks.9.ff.net.0.proj.weight", "down_blocks.2.attentions.0.transformer_blocks.9.ff.net.2.bias", "down_blocks.2.attentions.0.transformer_blocks.9.ff.net.2.weight", "down_blocks.2.attentions.0.transformer_blocks.9.norm1.bias", "down_blocks.2.attentions.0.transformer_blocks.9.norm1.weight", "down_blocks.2.attentions.0.transformer_blocks.9.norm2.bias", "down_blocks.2.attentions.0.transformer_blocks.9.norm2.weight", "down_blocks.2.attentions.0.transformer_blocks.9.norm3.bias", "down_blocks.2.attentions.0.transformer_blocks.9.norm3.weight", "down_blocks.2.attentions.1.transformer_blocks.1.attn1.to_k.weight", "down_blocks.2.attentions.1.transformer_blocks.1.attn1.to_out.0.bias", "down_blocks.2.attentions.1.transformer_blocks.1.attn1.to_out.0.weight", "down_blocks.2.attentions.1.transformer_blocks.1.attn1.to_q.weight", "down_blocks.2.attentions.1.transformer_blocks.1.attn1.to_v.weight", "down_blocks.2.attentions.1.transformer_blocks.1.attn2.to_k.weight", "down_blocks.2.attentions.1.transformer_blocks.1.attn2.to_out.0.bias", "down_blocks.2.attentions.1.transformer_blocks.1.attn2.to_out.0.weight", "down_blocks.2.attentions.1.transformer_blocks.1.attn2.to_q.weight", "down_blocks.2.attentions.1.transformer_blocks.1.attn2.to_v.weight", "down_blocks.2.attentions.1.transformer_blocks.1.ff.net.0.proj.bias", "down_blocks.2.attentions.1.transformer_blocks.1.ff.net.0.proj.weight", "down_blocks.2.attentions.1.transformer_blocks.1.ff.net.2.bias", "down_blocks.2.attentions.1.transformer_blocks.1.ff.net.2.weight", "down_blocks.2.attentions.1.transformer_blocks.1.norm1.bias", "down_blocks.2.attentions.1.transformer_blocks.1.norm1.weight", "down_blocks.2.attentions.1.transformer_blocks.1.norm2.bias", "down_blocks.2.attentions.1.transformer_blocks.1.norm2.weight", "down_blocks.2.attentions.1.transformer_blocks.1.norm3.bias", "down_blocks.2.attentions.1.transformer_blocks.1.norm3.weight", "down_blocks.2.attentions.1.transformer_blocks.2.attn1.to_k.weight", "down_blocks.2.attentions.1.transformer_blocks.2.attn1.to_out.0.bias", "down_blocks.2.attentions.1.transformer_blocks.2.attn1.to_out.0.weight", "down_blocks.2.attentions.1.transformer_blocks.2.attn1.to_q.weight", "down_blocks.2.attentions.1.transformer_blocks.2.attn1.to_v.weight", "down_blocks.2.attentions.1.transformer_blocks.2.attn2.to_k.weight", "down_blocks.2.attentions.1.transformer_blocks.2.attn2.to_out.0.bias", "down_blocks.2.attentions.1.transformer_blocks.2.attn2.to_out.0.weight", "down_blocks.2.attentions.1.transformer_blocks.2.attn2.to_q.weight", "down_blocks.2.attentions.1.transformer_blocks.2.attn2.to_v.weight", "down_blocks.2.attentions.1.transformer_blocks.2.ff.net.0.proj.bias", "down_blocks.2.attentions.1.transformer_blocks.2.ff.net.0.proj.weight", "down_blocks.2.attentions.1.transformer_blocks.2.ff.net.2.bias", "down_blocks.2.attentions.1.transformer_blocks.2.ff.net.2.weight", "down_blocks.2.attentions.1.transformer_blocks.2.norm1.bias", "down_blocks.2.attentions.1.transformer_blocks.2.norm1.weight", "down_blocks.2.attentions.1.transformer_blocks.2.norm2.bias", "down_blocks.2.attentions.1.transformer_blocks.2.norm2.weight", "down_blocks.2.attentions.1.transformer_blocks.2.norm3.bias", "down_blocks.2.attentions.1.transformer_blocks.2.norm3.weight", "down_blocks.2.attentions.1.transformer_blocks.3.attn1.to_k.weight", "down_blocks.2.attentions.1.transformer_blocks.3.attn1.to_out.0.bias", "down_blocks.2.attentions.1.transformer_blocks.3.attn1.to_out.0.weight", "down_blocks.2.attentions.1.transformer_blocks.3.attn1.to_q.weight", "down_blocks.2.attentions.1.transformer_blocks.3.attn1.to_v.weight", "down_blocks.2.attentions.1.transformer_blocks.3.attn2.to_k.weight", "down_blocks.2.attentions.1.transformer_blocks.3.attn2.to_out.0.bias", "down_blocks.2.attentions.1.transformer_blocks.3.attn2.to_out.0.weight", "down_blocks.2.attentions.1.transformer_blocks.3.attn2.to_q.weight", "down_blocks.2.attentions.1.transformer_blocks.3.attn2.to_v.weight", "down_blocks.2.attentions.1.transformer_blocks.3.ff.net.0.proj.bias", "down_blocks.2.attentions.1.transformer_blocks.3.ff.net.0.proj.weight", "down_blocks.2.attentions.1.transformer_blocks.3.ff.net.2.bias", "down_blocks.2.attentions.1.transformer_blocks.3.ff.net.2.weight", "down_blocks.2.attentions.1.transformer_blocks.3.norm1.bias", "down_blocks.2.attentions.1.transformer_blocks.3.norm1.weight", "down_blocks.2.attentions.1.transformer_blocks.3.norm2.bias", "down_blocks.2.attentions.1.transformer_blocks.3.norm2.weight", "down_blocks.2.attentions.1.transformer_blocks.3.norm3.bias", "down_blocks.2.attentions.1.transformer_blocks.3.norm3.weight", "down_blocks.2.attentions.1.transformer_blocks.4.attn1.to_k.weight", "down_blocks.2.attentions.1.transformer_blocks.4.attn1.to_out.0.bias", "down_blocks.2.attentions.1.transformer_blocks.4.attn1.to_out.0.weight", "down_blocks.2.attentions.1.transformer_blocks.4.attn1.to_q.weight", "down_blocks.2.attentions.1.transformer_blocks.4.attn1.to_v.weight", "down_blocks.2.attentions.1.transformer_blocks.4.attn2.to_k.weight", "down_blocks.2.attentions.1.transformer_blocks.4.attn2.to_out.0.bias", "down_blocks.2.attentions.1.transformer_blocks.4.attn2.to_out.0.weight", "down_blocks.2.attentions.1.transformer_blocks.4.attn2.to_q.weight", "down_blocks.2.attentions.1.transformer_blocks.4.attn2.to_v.weight", "down_blocks.2.attentions.1.transformer_blocks.4.ff.net.0.proj.bias", "down_blocks.2.attentions.1.transformer_blocks.4.ff.net.0.proj.weight", "down_blocks.2.attentions.1.transformer_blocks.4.ff.net.2.bias", "down_blocks.2.attentions.1.transformer_blocks.4.ff.net.2.weight", "down_blocks.2.attentions.1.transformer_blocks.4.norm1.bias", "down_blocks.2.attentions.1.transformer_blocks.4.norm1.weight", "down_blocks.2.attentions.1.transformer_blocks.4.norm2.bias", "down_blocks.2.attentions.1.transformer_blocks.4.norm2.weight", "down_blocks.2.attentions.1.transformer_blocks.4.norm3.bias", "down_blocks.2.attentions.1.transformer_blocks.4.norm3.weight", "down_blocks.2.attentions.1.transformer_blocks.5.attn1.to_k.weight", "down_blocks.2.attentions.1.transformer_blocks.5.attn1.to_out.0.bias", "down_blocks.2.attentions.1.transformer_blocks.5.attn1.to_out.0.weight", "down_blocks.2.attentions.1.transformer_blocks.5.attn1.to_q.weight", "down_blocks.2.attentions.1.transformer_blocks.5.attn1.to_v.weight", "down_blocks.2.attentions.1.transformer_blocks.5.attn2.to_k.weight", "down_blocks.2.attentions.1.transformer_blocks.5.attn2.to_out.0.bias", "down_blocks.2.attentions.1.transformer_blocks.5.attn2.to_out.0.weight", "down_blocks.2.attentions.1.transformer_blocks.5.attn2.to_q.weight", "down_blocks.2.attentions.1.transformer_blocks.5.attn2.to_v.weight", "down_blocks.2.attentions.1.transformer_blocks.5.ff.net.0.proj.bias", "down_blocks.2.attentions.1.transformer_blocks.5.ff.net.0.proj.weight", "down_blocks.2.attentions.1.transformer_blocks.5.ff.net.2.bias", "down_blocks.2.attentions.1.transformer_blocks.5.ff.net.2.weight", "down_blocks.2.attentions.1.transformer_blocks.5.norm1.bias", "down_blocks.2.attentions.1.transformer_blocks.5.norm1.weight", "down_blocks.2.attentions.1.transformer_blocks.5.norm2.bias", "down_blocks.2.attentions.1.transformer_blocks.5.norm2.weight", "down_blocks.2.attentions.1.transformer_blocks.5.norm3.bias", "down_blocks.2.attentions.1.transformer_blocks.5.norm3.weight", "down_blocks.2.attentions.1.transformer_blocks.6.attn1.to_k.weight", "down_blocks.2.attentions.1.transformer_blocks.6.attn1.to_out.0.bias", "down_blocks.2.attentions.1.transformer_blocks.6.attn1.to_out.0.weight", "down_blocks.2.attentions.1.transformer_blocks.6.attn1.to_q.weight", "down_blocks.2.attentions.1.transformer_blocks.6.attn1.to_v.weight", "down_blocks.2.attentions.1.transformer_blocks.6.attn2.to_k.weight", "down_blocks.2.attentions.1.transformer_blocks.6.attn2.to_out.0.bias", "down_blocks.2.attentions.1.transformer_blocks.6.attn2.to_out.0.weight", "down_blocks.2.attentions.1.transformer_blocks.6.attn2.to_q.weight", "down_blocks.2.attentions.1.transformer_blocks.6.attn2.to_v.weight", "down_blocks.2.attentions.1.transformer_blocks.6.ff.net.0.proj.bias", "down_blocks.2.attentions.1.transformer_blocks.6.ff.net.0.proj.weight", "down_blocks.2.attentions.1.transformer_blocks.6.ff.net.2.bias", "down_blocks.2.attentions.1.transformer_blocks.6.ff.net.2.weight", "down_blocks.2.attentions.1.transformer_blocks.6.norm1.bias", "down_blocks.2.attentions.1.transformer_blocks.6.norm1.weight", "down_blocks.2.attentions.1.transformer_blocks.6.norm2.bias", "down_blocks.2.attentions.1.transformer_blocks.6.norm2.weight", "down_blocks.2.attentions.1.transformer_blocks.6.norm3.bias", "down_blocks.2.attentions.1.transformer_blocks.6.norm3.weight", "down_blocks.2.attentions.1.transformer_blocks.7.attn1.to_k.weight", "down_blocks.2.attentions.1.transformer_blocks.7.attn1.to_out.0.bias", "down_blocks.2.attentions.1.transformer_blocks.7.attn1.to_out.0.weight", "down_blocks.2.attentions.1.transformer_blocks.7.attn1.to_q.weight", "down_blocks.2.attentions.1.transformer_blocks.7.attn1.to_v.weight", "down_blocks.2.attentions.1.transformer_blocks.7.attn2.to_k.weight", "down_blocks.2.attentions.1.transformer_blocks.7.attn2.to_out.0.bias", "down_blocks.2.attentions.1.transformer_blocks.7.attn2.to_out.0.weight", "down_blocks.2.attentions.1.transformer_blocks.7.attn2.to_q.weight", "down_blocks.2.attentions.1.transformer_blocks.7.attn2.to_v.weight", "down_blocks.2.attentions.1.transformer_blocks.7.ff.net.0.proj.bias", "down_blocks.2.attentions.1.transformer_blocks.7.ff.net.0.proj.weight", "down_blocks.2.attentions.1.transformer_blocks.7.ff.net.2.bias", "down_blocks.2.attentions.1.transformer_blocks.7.ff.net.2.weight", "down_blocks.2.attentions.1.transformer_blocks.7.norm1.bias", "down_blocks.2.attentions.1.transformer_blocks.7.norm1.weight", "down_blocks.2.attentions.1.transformer_blocks.7.norm2.bias", "down_blocks.2.attentions.1.transformer_blocks.7.norm2.weight", "down_blocks.2.attentions.1.transformer_blocks.7.norm3.bias", "down_blocks.2.attentions.1.transformer_blocks.7.norm3.weight", "down_blocks.2.attentions.1.transformer_blocks.8.attn1.to_k.weight", "down_blocks.2.attentions.1.transformer_blocks.8.attn1.to_out.0.bias", "down_blocks.2.attentions.1.transformer_blocks.8.attn1.to_out.0.weight", "down_blocks.2.attentions.1.transformer_blocks.8.attn1.to_q.weight", "down_blocks.2.attentions.1.transformer_blocks.8.attn1.to_v.weight", "down_blocks.2.attentions.1.transformer_blocks.8.attn2.to_k.weight", "down_blocks.2.attentions.1.transformer_blocks.8.attn2.to_out.0.bias", "down_blocks.2.attentions.1.transformer_blocks.8.attn2.to_out.0.weight", "down_blocks.2.attentions.1.transformer_blocks.8.attn2.to_q.weight", "down_blocks.2.attentions.1.transformer_blocks.8.attn2.to_v.weight", "down_blocks.2.attentions.1.transformer_blocks.8.ff.net.0.proj.bias", "down_blocks.2.attentions.1.transformer_blocks.8.ff.net.0.proj.weight", "down_blocks.2.attentions.1.transformer_blocks.8.ff.net.2.bias", "down_blocks.2.attentions.1.transformer_blocks.8.ff.net.2.weight", "down_blocks.2.attentions.1.transformer_blocks.8.norm1.bias", "down_blocks.2.attentions.1.transformer_blocks.8.norm1.weight", "down_blocks.2.attentions.1.transformer_blocks.8.norm2.bias", "down_blocks.2.attentions.1.transformer_blocks.8.norm2.weight", "down_blocks.2.attentions.1.transformer_blocks.8.norm3.bias", "down_blocks.2.attentions.1.transformer_blocks.8.norm3.weight", "down_blocks.2.attentions.1.transformer_blocks.9.attn1.to_k.weight", "down_blocks.2.attentions.1.transformer_blocks.9.attn1.to_out.0.bias", "down_blocks.2.attentions.1.transformer_blocks.9.attn1.to_out.0.weight", "down_blocks.2.attentions.1.transformer_blocks.9.attn1.to_q.weight", "down_blocks.2.attentions.1.transformer_blocks.9.attn1.to_v.weight", "down_blocks.2.attentions.1.transformer_blocks.9.attn2.to_k.weight", "down_blocks.2.attentions.1.transformer_blocks.9.attn2.to_out.0.bias", "down_blocks.2.attentions.1.transformer_blocks.9.attn2.to_out.0.weight", "down_blocks.2.attentions.1.transformer_blocks.9.attn2.to_q.weight", "down_blocks.2.attentions.1.transformer_blocks.9.attn2.to_v.weight", "down_blocks.2.attentions.1.transformer_blocks.9.ff.net.0.proj.bias", "down_blocks.2.attentions.1.transformer_blocks.9.ff.net.0.proj.weight", "down_blocks.2.attentions.1.transformer_blocks.9.ff.net.2.bias", "down_blocks.2.attentions.1.transformer_blocks.9.ff.net.2.weight", "down_blocks.2.attentions.1.transformer_blocks.9.norm1.bias", "down_blocks.2.attentions.1.transformer_blocks.9.norm1.weight", "down_blocks.2.attentions.1.transformer_blocks.9.norm2.bias", "down_blocks.2.attentions.1.transformer_blocks.9.norm2.weight", "down_blocks.2.attentions.1.transformer_blocks.9.norm3.bias", "down_blocks.2.attentions.1.transformer_blocks.9.norm3.weight", "up_blocks.0.attentions.0.norm.bias", "up_blocks.0.attentions.0.norm.weight", "up_blocks.0.attentions.0.proj_in.bias", "up_blocks.0.attentions.0.proj_in.weight", "up_blocks.0.attentions.0.proj_out.bias", "up_blocks.0.attentions.0.proj_out.weight", "up_blocks.0.attentions.0.transformer_blocks.0.attn1.to_k.weight", "up_blocks.0.attentions.0.transformer_blocks.0.attn1.to_out.0.bias", "up_blocks.0.attentions.0.transformer_blocks.0.attn1.to_out.0.weight", "up_blocks.0.attentions.0.transformer_blocks.0.attn1.to_q.weight", "up_blocks.0.attentions.0.transformer_blocks.0.attn1.to_v.weight", "up_blocks.0.attentions.0.transformer_blocks.0.attn2.to_k.weight", "up_blocks.0.attentions.0.transformer_blocks.0.attn2.to_out.0.bias", "up_blocks.0.attentions.0.transformer_blocks.0.attn2.to_out.0.weight", "up_blocks.0.attentions.0.transformer_blocks.0.attn2.to_q.weight", "up_blocks.0.attentions.0.transformer_blocks.0.attn2.to_v.weight", "up_blocks.0.attentions.0.transformer_blocks.0.ff.net.0.proj.bias", "up_blocks.0.attentions.0.transformer_blocks.0.ff.net.0.proj.weight", "up_blocks.0.attentions.0.transformer_blocks.0.ff.net.2.bias", "up_blocks.0.attentions.0.transformer_blocks.0.ff.net.2.weight", "up_blocks.0.attentions.0.transformer_blocks.0.norm1.bias", "up_blocks.0.attentions.0.transformer_blocks.0.norm1.weight", "up_blocks.0.attentions.0.transformer_blocks.0.norm2.bias", "up_blocks.0.attentions.0.transformer_blocks.0.norm2.weight", "up_blocks.0.attentions.0.transformer_blocks.0.norm3.bias", "up_blocks.0.attentions.0.transformer_blocks.0.norm3.weight", "up_blocks.0.attentions.0.transformer_blocks.1.attn1.to_k.weight", "up_blocks.0.attentions.0.transformer_blocks.1.attn1.to_out.0.bias", "up_blocks.0.attentions.0.transformer_blocks.1.attn1.to_out.0.weight", "up_blocks.0.attentions.0.transformer_blocks.1.attn1.to_q.weight", "up_blocks.0.attentions.0.transformer_blocks.1.attn1.to_v.weight", "up_blocks.0.attentions.0.transformer_blocks.1.attn2.to_k.weight", "up_blocks.0.attentions.0.transformer_blocks.1.attn2.to_out.0.bias", "up_blocks.0.attentions.0.transformer_blocks.1.attn2.to_out.0.weight", "up_blocks.0.attentions.0.transformer_blocks.1.attn2.to_q.weight", "up_blocks.0.attentions.0.transformer_blocks.1.attn2.to_v.weight", "up_blocks.0.attentions.0.transformer_blocks.1.ff.net.0.proj.bias", "up_blocks.0.attentions.0.transformer_blocks.1.ff.net.0.proj.weight", "up_blocks.0.attentions.0.transformer_blocks.1.ff.net.2.bias", "up_blocks.0.attentions.0.transformer_blocks.1.ff.net.2.weight", "up_blocks.0.attentions.0.transformer_blocks.1.norm1.bias", "up_blocks.0.attentions.0.transformer_blocks.1.norm1.weight", "up_blocks.0.attentions.0.transformer_blocks.1.norm2.bias", "up_blocks.0.attentions.0.transformer_blocks.1.norm2.weight", "up_blocks.0.attentions.0.transformer_blocks.1.norm3.bias", "up_blocks.0.attentions.0.transformer_blocks.1.norm3.weight", "up_blocks.0.attentions.0.transformer_blocks.2.attn1.to_k.weight", "up_blocks.0.attentions.0.transformer_blocks.2.attn1.to_out.0.bias", "up_blocks.0.attentions.0.transformer_blocks.2.attn1.to_out.0.weight", "up_blocks.0.attentions.0.transformer_blocks.2.attn1.to_q.weight", "up_blocks.0.attentions.0.transformer_blocks.2.attn1.to_v.weight", "up_blocks.0.attentions.0.transformer_blocks.2.attn2.to_k.weight", "up_blocks.0.attentions.0.transformer_blocks.2.attn2.to_out.0.bias", "up_blocks.0.attentions.0.transformer_blocks.2.attn2.to_out.0.weight", "up_blocks.0.attentions.0.transformer_blocks.2.attn2.to_q.weight", "up_blocks.0.attentions.0.transformer_blocks.2.attn2.to_v.weight", "up_blocks.0.attentions.0.transformer_blocks.2.ff.net.0.proj.bias", "up_blocks.0.attentions.0.transformer_blocks.2.ff.net.0.proj.weight", "up_blocks.0.attentions.0.transformer_blocks.2.ff.net.2.bias", "up_blocks.0.attentions.0.transformer_blocks.2.ff.net.2.weight", "up_blocks.0.attentions.0.transformer_blocks.2.norm1.bias", "up_blocks.0.attentions.0.transformer_blocks.2.norm1.weight", "up_blocks.0.attentions.0.transformer_blocks.2.norm2.bias", "up_blocks.0.attentions.0.transformer_blocks.2.norm2.weight", "up_blocks.0.attentions.0.transformer_blocks.2.norm3.bias", "up_blocks.0.attentions.0.transformer_blocks.2.norm3.weight", "up_blocks.0.attentions.0.transformer_blocks.3.attn1.to_k.weight", "up_blocks.0.attentions.0.transformer_blocks.3.attn1.to_out.0.bias", "up_blocks.0.attentions.0.transformer_blocks.3.attn1.to_out.0.weight", "up_blocks.0.attentions.0.transformer_blocks.3.attn1.to_q.weight", "up_blocks.0.attentions.0.transformer_blocks.3.attn1.to_v.weight", "up_blocks.0.attentions.0.transformer_blocks.3.attn2.to_k.weight", "up_blocks.0.attentions.0.transformer_blocks.3.attn2.to_out.0.bias", "up_blocks.0.attentions.0.transformer_blocks.3.attn2.to_out.0.weight", "up_blocks.0.attentions.0.transformer_blocks.3.attn2.to_q.weight", "up_blocks.0.attentions.0.transformer_blocks.3.attn2.to_v.weight", "up_blocks.0.attentions.0.transformer_blocks.3.ff.net.0.proj.bias", "up_blocks.0.attentions.0.transformer_blocks.3.ff.net.0.proj.weight", "up_blocks.0.attentions.0.transformer_blocks.3.ff.net.2.bias", "up_blocks.0.attentions.0.transformer_blocks.3.ff.net.2.weight", "up_blocks.0.attentions.0.transformer_blocks.3.norm1.bias", "up_blocks.0.attentions.0.transformer_blocks.3.norm1.weight", "up_blocks.0.attentions.0.transformer_blocks.3.norm2.bias", "up_blocks.0.attentions.0.transformer_blocks.3.norm2.weight", "up_blocks.0.attentions.0.transformer_blocks.3.norm3.bias", "up_blocks.0.attentions.0.transformer_blocks.3.norm3.weight", "up_blocks.0.attentions.0.transformer_blocks.4.attn1.to_k.weight", "up_blocks.0.attentions.0.transformer_blocks.4.attn1.to_out.0.bias", "up_blocks.0.attentions.0.transformer_blocks.4.attn1.to_out.0.weight", "up_blocks.0.attentions.0.transformer_blocks.4.attn1.to_q.weight", "up_blocks.0.attentions.0.transformer_blocks.4.attn1.to_v.weight", "up_blocks.0.attentions.0.transformer_blocks.4.attn2.to_k.weight", "up_blocks.0.attentions.0.transformer_blocks.4.attn2.to_out.0.bias", "up_blocks.0.attentions.0.transformer_blocks.4.attn2.to_out.0.weight", "up_blocks.0.attentions.0.transformer_blocks.4.attn2.to_q.weight", "up_blocks.0.attentions.0.transformer_blocks.4.attn2.to_v.weight", "up_blocks.0.attentions.0.transformer_blocks.4.ff.net.0.proj.bias", "up_blocks.0.attentions.0.transformer_blocks.4.ff.net.0.proj.weight", "up_blocks.0.attentions.0.transformer_blocks.4.ff.net.2.bias", "up_blocks.0.attentions.0.transformer_blocks.4.ff.net.2.weight", "up_blocks.0.attentions.0.transformer_blocks.4.norm1.bias", "up_blocks.0.attentions.0.transformer_blocks.4.norm1.weight", "up_blocks.0.attentions.0.transformer_blocks.4.norm2.bias", "up_blocks.0.attentions.0.transformer_blocks.4.norm2.weight", "up_blocks.0.attentions.0.transformer_blocks.4.norm3.bias", "up_blocks.0.attentions.0.transformer_blocks.4.norm3.weight", "up_blocks.0.attentions.0.transformer_blocks.5.attn1.to_k.weight", "up_blocks.0.attentions.0.transformer_blocks.5.attn1.to_out.0.bias", "up_blocks.0.attentions.0.transformer_blocks.5.attn1.to_out.0.weight", "up_blocks.0.attentions.0.transformer_blocks.5.attn1.to_q.weight", "up_blocks.0.attentions.0.transformer_blocks.5.attn1.to_v.weight", "up_blocks.0.attentions.0.transformer_blocks.5.attn2.to_k.weight", "up_blocks.0.attentions.0.transformer_blocks.5.attn2.to_out.0.bias", "up_blocks.0.attentions.0.transformer_blocks.5.attn2.to_out.0.weight", "up_blocks.0.attentions.0.transformer_blocks.5.attn2.to_q.weight", "up_blocks.0.attentions.0.transformer_blocks.5.attn2.to_v.weight", "up_blocks.0.attentions.0.transformer_blocks.5.ff.net.0.proj.bias", "up_blocks.0.attentions.0.transformer_blocks.5.ff.net.0.proj.weight", "up_blocks.0.attentions.0.transformer_blocks.5.ff.net.2.bias", "up_blocks.0.attentions.0.transformer_blocks.5.ff.net.2.weight", "up_blocks.0.attentions.0.transformer_blocks.5.norm1.bias", "up_blocks.0.attentions.0.transformer_blocks.5.norm1.weight", "up_blocks.0.attentions.0.transformer_blocks.5.norm2.bias", "up_blocks.0.attentions.0.transformer_blocks.5.norm2.weight", "up_blocks.0.attentions.0.transformer_blocks.5.norm3.bias", "up_blocks.0.attentions.0.transformer_blocks.5.norm3.weight", "up_blocks.0.attentions.0.transformer_blocks.6.attn1.to_k.weight", "up_blocks.0.attentions.0.transformer_blocks.6.attn1.to_out.0.bias", "up_blocks.0.attentions.0.transformer_blocks.6.attn1.to_out.0.weight", "up_blocks.0.attentions.0.transformer_blocks.6.attn1.to_q.weight", "up_blocks.0.attentions.0.transformer_blocks.6.attn1.to_v.weight", "up_blocks.0.attentions.0.transformer_blocks.6.attn2.to_k.weight", "up_blocks.0.attentions.0.transformer_blocks.6.attn2.to_out.0.bias", "up_blocks.0.attentions.0.transformer_blocks.6.attn2.to_out.0.weight", "up_blocks.0.attentions.0.transformer_blocks.6.attn2.to_q.weight", "up_blocks.0.attentions.0.transformer_blocks.6.attn2.to_v.weight", "up_blocks.0.attentions.0.transformer_blocks.6.ff.net.0.proj.bias", "up_blocks.0.attentions.0.transformer_blocks.6.ff.net.0.proj.weight", "up_blocks.0.attentions.0.transformer_blocks.6.ff.net.2.bias", "up_blocks.0.attentions.0.transformer_blocks.6.ff.net.2.weight", "up_blocks.0.attentions.0.transformer_blocks.6.norm1.bias", "up_blocks.0.attentions.0.transformer_blocks.6.norm1.weight", "up_blocks.0.attentions.0.transformer_blocks.6.norm2.bias", "up_blocks.0.attentions.0.transformer_blocks.6.norm2.weight", "up_blocks.0.attentions.0.transformer_blocks.6.norm3.bias", "up_blocks.0.attentions.0.transformer_blocks.6.norm3.weight", "up_blocks.0.attentions.0.transformer_blocks.7.attn1.to_k.weight", "up_blocks.0.attentions.0.transformer_blocks.7.attn1.to_out.0.bias", "up_blocks.0.attentions.0.transformer_blocks.7.attn1.to_out.0.weight", "up_blocks.0.attentions.0.transformer_blocks.7.attn1.to_q.weight", "up_blocks.0.attentions.0.transformer_blocks.7.attn1.to_v.weight", "up_blocks.0.attentions.0.transformer_blocks.7.attn2.to_k.weight", "up_blocks.0.attentions.0.transformer_blocks.7.attn2.to_out.0.bias", "up_blocks.0.attentions.0.transformer_blocks.7.attn2.to_out.0.weight", "up_blocks.0.attentions.0.transformer_blocks.7.attn2.to_q.weight", "up_blocks.0.attentions.0.transformer_blocks.7.attn2.to_v.weight", "up_blocks.0.attentions.0.transformer_blocks.7.ff.net.0.proj.bias", "up_blocks.0.attentions.0.transformer_blocks.7.ff.net.0.proj.weight", "up_blocks.0.attentions.0.transformer_blocks.7.ff.net.2.bias", "up_blocks.0.attentions.0.transformer_blocks.7.ff.net.2.weight", "up_blocks.0.attentions.0.transformer_blocks.7.norm1.bias", "up_blocks.0.attentions.0.transformer_blocks.7.norm1.weight", "up_blocks.0.attentions.0.transformer_blocks.7.norm2.bias", "up_blocks.0.attentions.0.transformer_blocks.7.norm2.weight", "up_blocks.0.attentions.0.transformer_blocks.7.norm3.bias", "up_blocks.0.attentions.0.transformer_blocks.7.norm3.weight", "up_blocks.0.attentions.0.transformer_blocks.8.attn1.to_k.weight", "up_blocks.0.attentions.0.transformer_blocks.8.attn1.to_out.0.bias", "up_blocks.0.attentions.0.transformer_blocks.8.attn1.to_out.0.weight", "up_blocks.0.attentions.0.transformer_blocks.8.attn1.to_q.weight", "up_blocks.0.attentions.0.transformer_blocks.8.attn1.to_v.weight", "up_blocks.0.attentions.0.transformer_blocks.8.attn2.to_k.weight", "up_blocks.0.attentions.0.transformer_blocks.8.attn2.to_out.0.bias", "up_blocks.0.attentions.0.transformer_blocks.8.attn2.to_out.0.weight", "up_blocks.0.attentions.0.transformer_blocks.8.attn2.to_q.weight", "up_blocks.0.attentions.0.transformer_blocks.8.attn2.to_v.weight", "up_blocks.0.attentions.0.transformer_blocks.8.ff.net.0.proj.bias", "up_blocks.0.attentions.0.transformer_blocks.8.ff.net.0.proj.weight", "up_blocks.0.attentions.0.transformer_blocks.8.ff.net.2.bias", "up_blocks.0.attentions.0.transformer_blocks.8.ff.net.2.weight", "up_blocks.0.attentions.0.transformer_blocks.8.norm1.bias", "up_blocks.0.attentions.0.transformer_blocks.8.norm1.weight", "up_blocks.0.attentions.0.transformer_blocks.8.norm2.bias", "up_blocks.0.attentions.0.transformer_blocks.8.norm2.weight", "up_blocks.0.attentions.0.transformer_blocks.8.norm3.bias", "up_blocks.0.attentions.0.transformer_blocks.8.norm3.weight", "up_blocks.0.attentions.0.transformer_blocks.9.attn1.to_k.weight", "up_blocks.0.attentions.0.transformer_blocks.9.attn1.to_out.0.bias", "up_blocks.0.attentions.0.transformer_blocks.9.attn1.to_out.0.weight", "up_blocks.0.attentions.0.transformer_blocks.9.attn1.to_q.weight", "up_blocks.0.attentions.0.transformer_blocks.9.attn1.to_v.weight", "up_blocks.0.attentions.0.transformer_blocks.9.attn2.to_k.weight", "up_blocks.0.attentions.0.transformer_blocks.9.attn2.to_out.0.bias", "up_blocks.0.attentions.0.transformer_blocks.9.attn2.to_out.0.weight", "up_blocks.0.attentions.0.transformer_blocks.9.attn2.to_q.weight", "up_blocks.0.attentions.0.transformer_blocks.9.attn2.to_v.weight", "up_blocks.0.attentions.0.transformer_blocks.9.ff.net.0.proj.bias", "up_blocks.0.attentions.0.transformer_blocks.9.ff.net.0.proj.weight", "up_blocks.0.attentions.0.transformer_blocks.9.ff.net.2.bias", "up_blocks.0.attentions.0.transformer_blocks.9.ff.net.2.weight", "up_blocks.0.attentions.0.transformer_blocks.9.norm1.bias", "up_blocks.0.attentions.0.transformer_blocks.9.norm1.weight", "up_blocks.0.attentions.0.transformer_blocks.9.norm2.bias", "up_blocks.0.attentions.0.transformer_blocks.9.norm2.weight", "up_blocks.0.attentions.0.transformer_blocks.9.norm3.bias", "up_blocks.0.attentions.0.transformer_blocks.9.norm3.weight", "up_blocks.0.attentions.1.norm.bias", "up_blocks.0.attentions.1.norm.weight", "up_blocks.0.attentions.1.proj_in.bias", "up_blocks.0.attentions.1.proj_in.weight", "up_blocks.0.attentions.1.proj_out.bias", "up_blocks.0.attentions.1.proj_out.weight", "up_blocks.0.attentions.1.transformer_blocks.0.attn1.to_k.weight", "up_blocks.0.attentions.1.transformer_blocks.0.attn1.to_out.0.bias", "up_blocks.0.attentions.1.transformer_blocks.0.attn1.to_out.0.weight", "up_blocks.0.attentions.1.transformer_blocks.0.attn1.to_q.weight", "up_blocks.0.attentions.1.transformer_blocks.0.attn1.to_v.weight", "up_blocks.0.attentions.1.transformer_blocks.0.attn2.to_k.weight", "up_blocks.0.attentions.1.transformer_blocks.0.attn2.to_out.0.bias", "up_blocks.0.attentions.1.transformer_blocks.0.attn2.to_out.0.weight", "up_blocks.0.attentions.1.transformer_blocks.0.attn2.to_q.weight", "up_blocks.0.attentions.1.transformer_blocks.0.attn2.to_v.weight", "up_blocks.0.attentions.1.transformer_blocks.0.ff.net.0.proj.bias", "up_blocks.0.attentions.1.transformer_blocks.0.ff.net.0.proj.weight", "up_blocks.0.attentions.1.transformer_blocks.0.ff.net.2.bias", "up_blocks.0.attentions.1.transformer_blocks.0.ff.net.2.weight", "up_blocks.0.attentions.1.transformer_blocks.0.norm1.bias", "up_blocks.0.attentions.1.transformer_blocks.0.norm1.weight", "up_blocks.0.attentions.1.transformer_blocks.0.norm2.bias", "up_blocks.0.attentions.1.transformer_blocks.0.norm2.weight", "up_blocks.0.attentions.1.transformer_blocks.0.norm3.bias", "up_blocks.0.attentions.1.transformer_blocks.0.norm3.weight", "up_blocks.0.attentions.1.transformer_blocks.1.attn1.to_k.weight", "up_blocks.0.attentions.1.transformer_blocks.1.attn1.to_out.0.bias", "up_blocks.0.attentions.1.transformer_blocks.1.attn1.to_out.0.weight", "up_blocks.0.attentions.1.transformer_blocks.1.attn1.to_q.weight", "up_blocks.0.attentions.1.transformer_blocks.1.attn1.to_v.weight", "up_blocks.0.attentions.1.transformer_blocks.1.attn2.to_k.weight", "up_blocks.0.attentions.1.transformer_blocks.1.attn2.to_out.0.bias", "up_blocks.0.attentions.1.transformer_blocks.1.attn2.to_out.0.weight", "up_blocks.0.attentions.1.transformer_blocks.1.attn2.to_q.weight", "up_blocks.0.attentions.1.transformer_blocks.1.attn2.to_v.weight", "up_blocks.0.attentions.1.transformer_blocks.1.ff.net.0.proj.bias", "up_blocks.0.attentions.1.transformer_blocks.1.ff.net.0.proj.weight", "up_blocks.0.attentions.1.transformer_blocks.1.ff.net.2.bias", "up_blocks.0.attentions.1.transformer_blocks.1.ff.net.2.weight", "up_blocks.0.attentions.1.transformer_blocks.1.norm1.bias", "up_blocks.0.attentions.1.transformer_blocks.1.norm1.weight", "up_blocks.0.attentions.1.transformer_blocks.1.norm2.bias", "up_blocks.0.attentions.1.transformer_blocks.1.norm2.weight", "up_blocks.0.attentions.1.transformer_blocks.1.norm3.bias", "up_blocks.0.attentions.1.transformer_blocks.1.norm3.weight", "up_blocks.0.attentions.1.transformer_blocks.2.attn1.to_k.weight", "up_blocks.0.attentions.1.transformer_blocks.2.attn1.to_out.0.bias", "up_blocks.0.attentions.1.transformer_blocks.2.attn1.to_out.0.weight", "up_blocks.0.attentions.1.transformer_blocks.2.attn1.to_q.weight", "up_blocks.0.attentions.1.transformer_blocks.2.attn1.to_v.weight", "up_blocks.0.attentions.1.transformer_blocks.2.attn2.to_k.weight", "up_blocks.0.attentions.1.transformer_blocks.2.attn2.to_out.0.bias", "up_blocks.0.attentions.1.transformer_blocks.2.attn2.to_out.0.weight", "up_blocks.0.attentions.1.transformer_blocks.2.attn2.to_q.weight", "up_blocks.0.attentions.1.transformer_blocks.2.attn2.to_v.weight", "up_blocks.0.attentions.1.transformer_blocks.2.ff.net.0.proj.bias", "up_blocks.0.attentions.1.transformer_blocks.2.ff.net.0.proj.weight", "up_blocks.0.attentions.1.transformer_blocks.2.ff.net.2.bias", "up_blocks.0.attentions.1.transformer_blocks.2.ff.net.2.weight", "up_blocks.0.attentions.1.transformer_blocks.2.norm1.bias", "up_blocks.0.attentions.1.transformer_blocks.2.norm1.weight", "up_blocks.0.attentions.1.transformer_blocks.2.norm2.bias", "up_blocks.0.attentions.1.transformer_blocks.2.norm2.weight", "up_blocks.0.attentions.1.transformer_blocks.2.norm3.bias", "up_blocks.0.attentions.1.transformer_blocks.2.norm3.weight", "up_blocks.0.attentions.1.transformer_blocks.3.attn1.to_k.weight", "up_blocks.0.attentions.1.transformer_blocks.3.attn1.to_out.0.bias", "up_blocks.0.attentions.1.transformer_blocks.3.attn1.to_out.0.weight", "up_blocks.0.attentions.1.transformer_blocks.3.attn1.to_q.weight", "up_blocks.0.attentions.1.transformer_blocks.3.attn1.to_v.weight", "up_blocks.0.attentions.1.transformer_blocks.3.attn2.to_k.weight", "up_blocks.0.attentions.1.transformer_blocks.3.attn2.to_out.0.bias", "up_blocks.0.attentions.1.transformer_blocks.3.attn2.to_out.0.weight", "up_blocks.0.attentions.1.transformer_blocks.3.attn2.to_q.weight", "up_blocks.0.attentions.1.transformer_blocks.3.attn2.to_v.weight", "up_blocks.0.attentions.1.transformer_blocks.3.ff.net.0.proj.bias", "up_blocks.0.attentions.1.transformer_blocks.3.ff.net.0.proj.weight", "up_blocks.0.attentions.1.transformer_blocks.3.ff.net.2.bias", "up_blocks.0.attentions.1.transformer_blocks.3.ff.net.2.weight", "up_blocks.0.attentions.1.transformer_blocks.3.norm1.bias", "up_blocks.0.attentions.1.transformer_blocks.3.norm1.weight", "up_blocks.0.attentions.1.transformer_blocks.3.norm2.bias", "up_blocks.0.attentions.1.transformer_blocks.3.norm2.weight", "up_blocks.0.attentions.1.transformer_blocks.3.norm3.bias", "up_blocks.0.attentions.1.transformer_blocks.3.norm3.weight", "up_blocks.0.attentions.1.transformer_blocks.4.attn1.to_k.weight", "up_blocks.0.attentions.1.transformer_blocks.4.attn1.to_out.0.bias", "up_blocks.0.attentions.1.transformer_blocks.4.attn1.to_out.0.weight", "up_blocks.0.attentions.1.transformer_blocks.4.attn1.to_q.weight", "up_blocks.0.attentions.1.transformer_blocks.4.attn1.to_v.weight", "up_blocks.0.attentions.1.transformer_blocks.4.attn2.to_k.weight", "up_blocks.0.attentions.1.transformer_blocks.4.attn2.to_out.0.bias", "up_blocks.0.attentions.1.transformer_blocks.4.attn2.to_out.0.weight", "up_blocks.0.attentions.1.transformer_blocks.4.attn2.to_q.weight", "up_blocks.0.attentions.1.transformer_blocks.4.attn2.to_v.weight", "up_blocks.0.attentions.1.transformer_blocks.4.ff.net.0.proj.bias", "up_blocks.0.attentions.1.transformer_blocks.4.ff.net.0.proj.weight", "up_blocks.0.attentions.1.transformer_blocks.4.ff.net.2.bias", "up_blocks.0.attentions.1.transformer_blocks.4.ff.net.2.weight", "up_blocks.0.attentions.1.transformer_blocks.4.norm1.bias", "up_blocks.0.attentions.1.transformer_blocks.4.norm1.weight", "up_blocks.0.attentions.1.transformer_blocks.4.norm2.bias", "up_blocks.0.attentions.1.transformer_blocks.4.norm2.weight", "up_blocks.0.attentions.1.transformer_blocks.4.norm3.bias", "up_blocks.0.attentions.1.transformer_blocks.4.norm3.weight", "up_blocks.0.attentions.1.transformer_blocks.5.attn1.to_k.weight", "up_blocks.0.attentions.1.transformer_blocks.5.attn1.to_out.0.bias", "up_blocks.0.attentions.1.transformer_blocks.5.attn1.to_out.0.weight", "up_blocks.0.attentions.1.transformer_blocks.5.attn1.to_q.weight", "up_blocks.0.attentions.1.transformer_blocks.5.attn1.to_v.weight", "up_blocks.0.attentions.1.transformer_blocks.5.attn2.to_k.weight", "up_blocks.0.attentions.1.transformer_blocks.5.attn2.to_out.0.bias", "up_blocks.0.attentions.1.transformer_blocks.5.attn2.to_out.0.weight", "up_blocks.0.attentions.1.transformer_blocks.5.attn2.to_q.weight", "up_blocks.0.attentions.1.transformer_blocks.5.attn2.to_v.weight", "up_blocks.0.attentions.1.transformer_blocks.5.ff.net.0.proj.bias", "up_blocks.0.attentions.1.transformer_blocks.5.ff.net.0.proj.weight", "up_blocks.0.attentions.1.transformer_blocks.5.ff.net.2.bias", "up_blocks.0.attentions.1.transformer_blocks.5.ff.net.2.weight", "up_blocks.0.attentions.1.transformer_blocks.5.norm1.bias", "up_blocks.0.attentions.1.transformer_blocks.5.norm1.weight", "up_blocks.0.attentions.1.transformer_blocks.5.norm2.bias", "up_blocks.0.attentions.1.transformer_blocks.5.norm2.weight", "up_blocks.0.attentions.1.transformer_blocks.5.norm3.bias", "up_blocks.0.attentions.1.transformer_blocks.5.norm3.weight", "up_blocks.0.attentions.1.transformer_blocks.6.attn1.to_k.weight", "up_blocks.0.attentions.1.transformer_blocks.6.attn1.to_out.0.bias", "up_blocks.0.attentions.1.transformer_blocks.6.attn1.to_out.0.weight", "up_blocks.0.attentions.1.transformer_blocks.6.attn1.to_q.weight", "up_blocks.0.attentions.1.transformer_blocks.6.attn1.to_v.weight", "up_blocks.0.attentions.1.transformer_blocks.6.attn2.to_k.weight", "up_blocks.0.attentions.1.transformer_blocks.6.attn2.to_out.0.bias", "up_blocks.0.attentions.1.transformer_blocks.6.attn2.to_out.0.weight", "up_blocks.0.attentions.1.transformer_blocks.6.attn2.to_q.weight", "up_blocks.0.attentions.1.transformer_blocks.6.attn2.to_v.weight", "up_blocks.0.attentions.1.transformer_blocks.6.ff.net.0.proj.bias", "up_blocks.0.attentions.1.transformer_blocks.6.ff.net.0.proj.weight", "up_blocks.0.attentions.1.transformer_blocks.6.ff.net.2.bias", "up_blocks.0.attentions.1.transformer_blocks.6.ff.net.2.weight", "up_blocks.0.attentions.1.transformer_blocks.6.norm1.bias", "up_blocks.0.attentions.1.transformer_blocks.6.norm1.weight", "up_blocks.0.attentions.1.transformer_blocks.6.norm2.bias", "up_blocks.0.attentions.1.transformer_blocks.6.norm2.weight", "up_blocks.0.attentions.1.transformer_blocks.6.norm3.bias", "up_blocks.0.attentions.1.transformer_blocks.6.norm3.weight", "up_blocks.0.attentions.1.transformer_blocks.7.attn1.to_k.weight", "up_blocks.0.attentions.1.transformer_blocks.7.attn1.to_out.0.bias", "up_blocks.0.attentions.1.transformer_blocks.7.attn1.to_out.0.weight", "up_blocks.0.attentions.1.transformer_blocks.7.attn1.to_q.weight", "up_blocks.0.attentions.1.transformer_blocks.7.attn1.to_v.weight", "up_blocks.0.attentions.1.transformer_blocks.7.attn2.to_k.weight", "up_blocks.0.attentions.1.transformer_blocks.7.attn2.to_out.0.bias", "up_blocks.0.attentions.1.transformer_blocks.7.attn2.to_out.0.weight", "up_blocks.0.attentions.1.transformer_blocks.7.attn2.to_q.weight", "up_blocks.0.attentions.1.transformer_blocks.7.attn2.to_v.weight", "up_blocks.0.attentions.1.transformer_blocks.7.ff.net.0.proj.bias", "up_blocks.0.attentions.1.transformer_blocks.7.ff.net.0.proj.weight", "up_blocks.0.attentions.1.transformer_blocks.7.ff.net.2.bias", "up_blocks.0.attentions.1.transformer_blocks.7.ff.net.2.weight", "up_blocks.0.attentions.1.transformer_blocks.7.norm1.bias", "up_blocks.0.attentions.1.transformer_blocks.7.norm1.weight", "up_blocks.0.attentions.1.transformer_blocks.7.norm2.bias", "up_blocks.0.attentions.1.transformer_blocks.7.norm2.weight", "up_blocks.0.attentions.1.transformer_blocks.7.norm3.bias", "up_blocks.0.attentions.1.transformer_blocks.7.norm3.weight", "up_blocks.0.attentions.1.transformer_blocks.8.attn1.to_k.weight", "up_blocks.0.attentions.1.transformer_blocks.8.attn1.to_out.0.bias", "up_blocks.0.attentions.1.transformer_blocks.8.attn1.to_out.0.weight", "up_blocks.0.attentions.1.transformer_blocks.8.attn1.to_q.weight", "up_blocks.0.attentions.1.transformer_blocks.8.attn1.to_v.weight", "up_blocks.0.attentions.1.transformer_blocks.8.attn2.to_k.weight", "up_blocks.0.attentions.1.transformer_blocks.8.attn2.to_out.0.bias", "up_blocks.0.attentions.1.transformer_blocks.8.attn2.to_out.0.weight", "up_blocks.0.attentions.1.transformer_blocks.8.attn2.to_q.weight", "up_blocks.0.attentions.1.transformer_blocks.8.attn2.to_v.weight", "up_blocks.0.attentions.1.transformer_blocks.8.ff.net.0.proj.bias", "up_blocks.0.attentions.1.transformer_blocks.8.ff.net.0.proj.weight", "up_blocks.0.attentions.1.transformer_blocks.8.ff.net.2.bias", "up_blocks.0.attentions.1.transformer_blocks.8.ff.net.2.weight", "up_blocks.0.attentions.1.transformer_blocks.8.norm1.bias", "up_blocks.0.attentions.1.transformer_blocks.8.norm1.weight", "up_blocks.0.attentions.1.transformer_blocks.8.norm2.bias", "up_blocks.0.attentions.1.transformer_blocks.8.norm2.weight", "up_blocks.0.attentions.1.transformer_blocks.8.norm3.bias", "up_blocks.0.attentions.1.transformer_blocks.8.norm3.weight", "up_blocks.0.attentions.1.transformer_blocks.9.attn1.to_k.weight", "up_blocks.0.attentions.1.transformer_blocks.9.attn1.to_out.0.bias", "up_blocks.0.attentions.1.transformer_blocks.9.attn1.to_out.0.weight", "up_blocks.0.attentions.1.transformer_blocks.9.attn1.to_q.weight", "up_blocks.0.attentions.1.transformer_blocks.9.attn1.to_v.weight", "up_blocks.0.attentions.1.transformer_blocks.9.attn2.to_k.weight", "up_blocks.0.attentions.1.transformer_blocks.9.attn2.to_out.0.bias", "up_blocks.0.attentions.1.transformer_blocks.9.attn2.to_out.0.weight", "up_blocks.0.attentions.1.transformer_blocks.9.attn2.to_q.weight", "up_blocks.0.attentions.1.transformer_blocks.9.attn2.to_v.weight", "up_blocks.0.attentions.1.transformer_blocks.9.ff.net.0.proj.bias", "up_blocks.0.attentions.1.transformer_blocks.9.ff.net.0.proj.weight", "up_blocks.0.attentions.1.transformer_blocks.9.ff.net.2.bias", "up_blocks.0.attentions.1.transformer_blocks.9.ff.net.2.weight", "up_blocks.0.attentions.1.transformer_blocks.9.norm1.bias", "up_blocks.0.attentions.1.transformer_blocks.9.norm1.weight", "up_blocks.0.attentions.1.transformer_blocks.9.norm2.bias", "up_blocks.0.attentions.1.transformer_blocks.9.norm2.weight", "up_blocks.0.attentions.1.transformer_blocks.9.norm3.bias", "up_blocks.0.attentions.1.transformer_blocks.9.norm3.weight", "up_blocks.0.attentions.2.norm.bias", "up_blocks.0.attentions.2.norm.weight", "up_blocks.0.attentions.2.proj_in.bias", "up_blocks.0.attentions.2.proj_in.weight", "up_blocks.0.attentions.2.proj_out.bias", "up_blocks.0.attentions.2.proj_out.weight", "up_blocks.0.attentions.2.transformer_blocks.0.attn1.to_k.weight", "up_blocks.0.attentions.2.transformer_blocks.0.attn1.to_out.0.bias", "up_blocks.0.attentions.2.transformer_blocks.0.attn1.to_out.0.weight", "up_blocks.0.attentions.2.transformer_blocks.0.attn1.to_q.weight", "up_blocks.0.attentions.2.transformer_blocks.0.attn1.to_v.weight", "up_blocks.0.attentions.2.transformer_blocks.0.attn2.to_k.weight", "up_blocks.0.attentions.2.transformer_blocks.0.attn2.to_out.0.bias", "up_blocks.0.attentions.2.transformer_blocks.0.attn2.to_out.0.weight", "up_blocks.0.attentions.2.transformer_blocks.0.attn2.to_q.weight", "up_blocks.0.attentions.2.transformer_blocks.0.attn2.to_v.weight", "up_blocks.0.attentions.2.transformer_blocks.0.ff.net.0.proj.bias", "up_blocks.0.attentions.2.transformer_blocks.0.ff.net.0.proj.weight", "up_blocks.0.attentions.2.transformer_blocks.0.ff.net.2.bias", "up_blocks.0.attentions.2.transformer_blocks.0.ff.net.2.weight", "up_blocks.0.attentions.2.transformer_blocks.0.norm1.bias", "up_blocks.0.attentions.2.transformer_blocks.0.norm1.weight", "up_blocks.0.attentions.2.transformer_blocks.0.norm2.bias", "up_blocks.0.attentions.2.transformer_blocks.0.norm2.weight", "up_blocks.0.attentions.2.transformer_blocks.0.norm3.bias", "up_blocks.0.attentions.2.transformer_blocks.0.norm3.weight", "up_blocks.0.attentions.2.transformer_blocks.1.attn1.to_k.weight", "up_blocks.0.attentions.2.transformer_blocks.1.attn1.to_out.0.bias", "up_blocks.0.attentions.2.transformer_blocks.1.attn1.to_out.0.weight", "up_blocks.0.attentions.2.transformer_blocks.1.attn1.to_q.weight", "up_blocks.0.attentions.2.transformer_blocks.1.attn1.to_v.weight", "up_blocks.0.attentions.2.transformer_blocks.1.attn2.to_k.weight", "up_blocks.0.attentions.2.transformer_blocks.1.attn2.to_out.0.bias", "up_blocks.0.attentions.2.transformer_blocks.1.attn2.to_out.0.weight", "up_blocks.0.attentions.2.transformer_blocks.1.attn2.to_q.weight", "up_blocks.0.attentions.2.transformer_blocks.1.attn2.to_v.weight", "up_blocks.0.attentions.2.transformer_blocks.1.ff.net.0.proj.bias", "up_blocks.0.attentions.2.transformer_blocks.1.ff.net.0.proj.weight", "up_blocks.0.attentions.2.transformer_blocks.1.ff.net.2.bias", "up_blocks.0.attentions.2.transformer_blocks.1.ff.net.2.weight", "up_blocks.0.attentions.2.transformer_blocks.1.norm1.bias", "up_blocks.0.attentions.2.transformer_blocks.1.norm1.weight", "up_blocks.0.attentions.2.transformer_blocks.1.norm2.bias", "up_blocks.0.attentions.2.transformer_blocks.1.norm2.weight", "up_blocks.0.attentions.2.transformer_blocks.1.norm3.bias", "up_blocks.0.attentions.2.transformer_blocks.1.norm3.weight", "up_blocks.0.attentions.2.transformer_blocks.2.attn1.to_k.weight", "up_blocks.0.attentions.2.transformer_blocks.2.attn1.to_out.0.bias", "up_blocks.0.attentions.2.transformer_blocks.2.attn1.to_out.0.weight", "up_blocks.0.attentions.2.transformer_blocks.2.attn1.to_q.weight", "up_blocks.0.attentions.2.transformer_blocks.2.attn1.to_v.weight", "up_blocks.0.attentions.2.transformer_blocks.2.attn2.to_k.weight", "up_blocks.0.attentions.2.transformer_blocks.2.attn2.to_out.0.bias", "up_blocks.0.attentions.2.transformer_blocks.2.attn2.to_out.0.weight", "up_blocks.0.attentions.2.transformer_blocks.2.attn2.to_q.weight", "up_blocks.0.attentions.2.transformer_blocks.2.attn2.to_v.weight", "up_blocks.0.attentions.2.transformer_blocks.2.ff.net.0.proj.bias", "up_blocks.0.attentions.2.transformer_blocks.2.ff.net.0.proj.weight", "up_blocks.0.attentions.2.transformer_blocks.2.ff.net.2.bias", "up_blocks.0.attentions.2.transformer_blocks.2.ff.net.2.weight", "up_blocks.0.attentions.2.transformer_blocks.2.norm1.bias", "up_blocks.0.attentions.2.transformer_blocks.2.norm1.weight", "up_blocks.0.attentions.2.transformer_blocks.2.norm2.bias", "up_blocks.0.attentions.2.transformer_blocks.2.norm2.weight", "up_blocks.0.attentions.2.transformer_blocks.2.norm3.bias", "up_blocks.0.attentions.2.transformer_blocks.2.norm3.weight", "up_blocks.0.attentions.2.transformer_blocks.3.attn1.to_k.weight", "up_blocks.0.attentions.2.transformer_blocks.3.attn1.to_out.0.bias", "up_blocks.0.attentions.2.transformer_blocks.3.attn1.to_out.0.weight", "up_blocks.0.attentions.2.transformer_blocks.3.attn1.to_q.weight", "up_blocks.0.attentions.2.transformer_blocks.3.attn1.to_v.weight", "up_blocks.0.attentions.2.transformer_blocks.3.attn2.to_k.weight", "up_blocks.0.attentions.2.transformer_blocks.3.attn2.to_out.0.bias", "up_blocks.0.attentions.2.transformer_blocks.3.attn2.to_out.0.weight", "up_blocks.0.attentions.2.transformer_blocks.3.attn2.to_q.weight", "up_blocks.0.attentions.2.transformer_blocks.3.attn2.to_v.weight", "up_blocks.0.attentions.2.transformer_blocks.3.ff.net.0.proj.bias", "up_blocks.0.attentions.2.transformer_blocks.3.ff.net.0.proj.weight", "up_blocks.0.attentions.2.transformer_blocks.3.ff.net.2.bias", "up_blocks.0.attentions.2.transformer_blocks.3.ff.net.2.weight", "up_blocks.0.attentions.2.transformer_blocks.3.norm1.bias", "up_blocks.0.attentions.2.transformer_blocks.3.norm1.weight", "up_blocks.0.attentions.2.transformer_blocks.3.norm2.bias", "up_blocks.0.attentions.2.transformer_blocks.3.norm2.weight", "up_blocks.0.attentions.2.transformer_blocks.3.norm3.bias", "up_blocks.0.attentions.2.transformer_blocks.3.norm3.weight", "up_blocks.0.attentions.2.transformer_blocks.4.attn1.to_k.weight", "up_blocks.0.attentions.2.transformer_blocks.4.attn1.to_out.0.bias", "up_blocks.0.attentions.2.transformer_blocks.4.attn1.to_out.0.weight", "up_blocks.0.attentions.2.transformer_blocks.4.attn1.to_q.weight", "up_blocks.0.attentions.2.transformer_blocks.4.attn1.to_v.weight", "up_blocks.0.attentions.2.transformer_blocks.4.attn2.to_k.weight", "up_blocks.0.attentions.2.transformer_blocks.4.attn2.to_out.0.bias", "up_blocks.0.attentions.2.transformer_blocks.4.attn2.to_out.0.weight", "up_blocks.0.attentions.2.transformer_blocks.4.attn2.to_q.weight", "up_blocks.0.attentions.2.transformer_blocks.4.attn2.to_v.weight", "up_blocks.0.attentions.2.transformer_blocks.4.ff.net.0.proj.bias", "up_blocks.0.attentions.2.transformer_blocks.4.ff.net.0.proj.weight", "up_blocks.0.attentions.2.transformer_blocks.4.ff.net.2.bias", "up_blocks.0.attentions.2.transformer_blocks.4.ff.net.2.weight", "up_blocks.0.attentions.2.transformer_blocks.4.norm1.bias", "up_blocks.0.attentions.2.transformer_blocks.4.norm1.weight", "up_blocks.0.attentions.2.transformer_blocks.4.norm2.bias", "up_blocks.0.attentions.2.transformer_blocks.4.norm2.weight", "up_blocks.0.attentions.2.transformer_blocks.4.norm3.bias", "up_blocks.0.attentions.2.transformer_blocks.4.norm3.weight", "up_blocks.0.attentions.2.transformer_blocks.5.attn1.to_k.weight", "up_blocks.0.attentions.2.transformer_blocks.5.attn1.to_out.0.bias", "up_blocks.0.attentions.2.transformer_blocks.5.attn1.to_out.0.weight", "up_blocks.0.attentions.2.transformer_blocks.5.attn1.to_q.weight", "up_blocks.0.attentions.2.transformer_blocks.5.attn1.to_v.weight", "up_blocks.0.attentions.2.transformer_blocks.5.attn2.to_k.weight", "up_blocks.0.attentions.2.transformer_blocks.5.attn2.to_out.0.bias", "up_blocks.0.attentions.2.transformer_blocks.5.attn2.to_out.0.weight", "up_blocks.0.attentions.2.transformer_blocks.5.attn2.to_q.weight", "up_blocks.0.attentions.2.transformer_blocks.5.attn2.to_v.weight", "up_blocks.0.attentions.2.transformer_blocks.5.ff.net.0.proj.bias", "up_blocks.0.attentions.2.transformer_blocks.5.ff.net.0.proj.weight", "up_blocks.0.attentions.2.transformer_blocks.5.ff.net.2.bias", "up_blocks.0.attentions.2.transformer_blocks.5.ff.net.2.weight", "up_blocks.0.attentions.2.transformer_blocks.5.norm1.bias", "up_blocks.0.attentions.2.transformer_blocks.5.norm1.weight", "up_blocks.0.attentions.2.transformer_blocks.5.norm2.bias", "up_blocks.0.attentions.2.transformer_blocks.5.norm2.weight", "up_blocks.0.attentions.2.transformer_blocks.5.norm3.bias", "up_blocks.0.attentions.2.transformer_blocks.5.norm3.weight", "up_blocks.0.attentions.2.transformer_blocks.6.attn1.to_k.weight", "up_blocks.0.attentions.2.transformer_blocks.6.attn1.to_out.0.bias", "up_blocks.0.attentions.2.transformer_blocks.6.attn1.to_out.0.weight", "up_blocks.0.attentions.2.transformer_blocks.6.attn1.to_q.weight", "up_blocks.0.attentions.2.transformer_blocks.6.attn1.to_v.weight", "up_blocks.0.attentions.2.transformer_blocks.6.attn2.to_k.weight", "up_blocks.0.attentions.2.transformer_blocks.6.attn2.to_out.0.bias", "up_blocks.0.attentions.2.transformer_blocks.6.attn2.to_out.0.weight", "up_blocks.0.attentions.2.transformer_blocks.6.attn2.to_q.weight", "up_blocks.0.attentions.2.transformer_blocks.6.attn2.to_v.weight", "up_blocks.0.attentions.2.transformer_blocks.6.ff.net.0.proj.bias", "up_blocks.0.attentions.2.transformer_blocks.6.ff.net.0.proj.weight", "up_blocks.0.attentions.2.transformer_blocks.6.ff.net.2.bias", "up_blocks.0.attentions.2.transformer_blocks.6.ff.net.2.weight", "up_blocks.0.attentions.2.transformer_blocks.6.norm1.bias", "up_blocks.0.attentions.2.transformer_blocks.6.norm1.weight", "up_blocks.0.attentions.2.transformer_blocks.6.norm2.bias", "up_blocks.0.attentions.2.transformer_blocks.6.norm2.weight", "up_blocks.0.attentions.2.transformer_blocks.6.norm3.bias", "up_blocks.0.attentions.2.transformer_blocks.6.norm3.weight", "up_blocks.0.attentions.2.transformer_blocks.7.attn1.to_k.weight", "up_blocks.0.attentions.2.transformer_blocks.7.attn1.to_out.0.bias", "up_blocks.0.attentions.2.transformer_blocks.7.attn1.to_out.0.weight", "up_blocks.0.attentions.2.transformer_blocks.7.attn1.to_q.weight", "up_blocks.0.attentions.2.transformer_blocks.7.attn1.to_v.weight", "up_blocks.0.attentions.2.transformer_blocks.7.attn2.to_k.weight", "up_blocks.0.attentions.2.transformer_blocks.7.attn2.to_out.0.bias", "up_blocks.0.attentions.2.transformer_blocks.7.attn2.to_out.0.weight", "up_blocks.0.attentions.2.transformer_blocks.7.attn2.to_q.weight", "up_blocks.0.attentions.2.transformer_blocks.7.attn2.to_v.weight", "up_blocks.0.attentions.2.transformer_blocks.7.ff.net.0.proj.bias", "up_blocks.0.attentions.2.transformer_blocks.7.ff.net.0.proj.weight", "up_blocks.0.attentions.2.transformer_blocks.7.ff.net.2.bias", "up_blocks.0.attentions.2.transformer_blocks.7.ff.net.2.weight", "up_blocks.0.attentions.2.transformer_blocks.7.norm1.bias", "up_blocks.0.attentions.2.transformer_blocks.7.norm1.weight", "up_blocks.0.attentions.2.transformer_blocks.7.norm2.bias", "up_blocks.0.attentions.2.transformer_blocks.7.norm2.weight", "up_blocks.0.attentions.2.transformer_blocks.7.norm3.bias", "up_blocks.0.attentions.2.transformer_blocks.7.norm3.weight", "up_blocks.0.attentions.2.transformer_blocks.8.attn1.to_k.weight", "up_blocks.0.attentions.2.transformer_blocks.8.attn1.to_out.0.bias", "up_blocks.0.attentions.2.transformer_blocks.8.attn1.to_out.0.weight", "up_blocks.0.attentions.2.transformer_blocks.8.attn1.to_q.weight", "up_blocks.0.attentions.2.transformer_blocks.8.attn1.to_v.weight", "up_blocks.0.attentions.2.transformer_blocks.8.attn2.to_k.weight", "up_blocks.0.attentions.2.transformer_blocks.8.attn2.to_out.0.bias", "up_blocks.0.attentions.2.transformer_blocks.8.attn2.to_out.0.weight", "up_blocks.0.attentions.2.transformer_blocks.8.attn2.to_q.weight", "up_blocks.0.attentions.2.transformer_blocks.8.attn2.to_v.weight", "up_blocks.0.attentions.2.transformer_blocks.8.ff.net.0.proj.bias", "up_blocks.0.attentions.2.transformer_blocks.8.ff.net.0.proj.weight", "up_blocks.0.attentions.2.transformer_blocks.8.ff.net.2.bias", "up_blocks.0.attentions.2.transformer_blocks.8.ff.net.2.weight", "up_blocks.0.attentions.2.transformer_blocks.8.norm1.bias", "up_blocks.0.attentions.2.transformer_blocks.8.norm1.weight", "up_blocks.0.attentions.2.transformer_blocks.8.norm2.bias", "up_blocks.0.attentions.2.transformer_blocks.8.norm2.weight", "up_blocks.0.attentions.2.transformer_blocks.8.norm3.bias", "up_blocks.0.attentions.2.transformer_blocks.8.norm3.weight", "up_blocks.0.attentions.2.transformer_blocks.9.attn1.to_k.weight", "up_blocks.0.attentions.2.transformer_blocks.9.attn1.to_out.0.bias", "up_blocks.0.attentions.2.transformer_blocks.9.attn1.to_out.0.weight", "up_blocks.0.attentions.2.transformer_blocks.9.attn1.to_q.weight", "up_blocks.0.attentions.2.transformer_blocks.9.attn1.to_v.weight", "up_blocks.0.attentions.2.transformer_blocks.9.attn2.to_k.weight", "up_blocks.0.attentions.2.transformer_blocks.9.attn2.to_out.0.bias", "up_blocks.0.attentions.2.transformer_blocks.9.attn2.to_out.0.weight", "up_blocks.0.attentions.2.transformer_blocks.9.attn2.to_q.weight", "up_blocks.0.attentions.2.transformer_blocks.9.attn2.to_v.weight", "up_blocks.0.attentions.2.transformer_blocks.9.ff.net.0.proj.bias", "up_blocks.0.attentions.2.transformer_blocks.9.ff.net.0.proj.weight", "up_blocks.0.attentions.2.transformer_blocks.9.ff.net.2.bias", "up_blocks.0.attentions.2.transformer_blocks.9.ff.net.2.weight", "up_blocks.0.attentions.2.transformer_blocks.9.norm1.bias", "up_blocks.0.attentions.2.transformer_blocks.9.norm1.weight", "up_blocks.0.attentions.2.transformer_blocks.9.norm2.bias", "up_blocks.0.attentions.2.transformer_blocks.9.norm2.weight", "up_blocks.0.attentions.2.transformer_blocks.9.norm3.bias", "up_blocks.0.attentions.2.transformer_blocks.9.norm3.weight", "up_blocks.1.attentions.0.transformer_blocks.1.attn1.to_k.weight", "up_blocks.1.attentions.0.transformer_blocks.1.attn1.to_out.0.bias", "up_blocks.1.attentions.0.transformer_blocks.1.attn1.to_out.0.weight", "up_blocks.1.attentions.0.transformer_blocks.1.attn1.to_q.weight", "up_blocks.1.attentions.0.transformer_blocks.1.attn1.to_v.weight", "up_blocks.1.attentions.0.transformer_blocks.1.attn2.to_k.weight", "up_blocks.1.attentions.0.transformer_blocks.1.attn2.to_out.0.bias", "up_blocks.1.attentions.0.transformer_blocks.1.attn2.to_out.0.weight", "up_blocks.1.attentions.0.transformer_blocks.1.attn2.to_q.weight", "up_blocks.1.attentions.0.transformer_blocks.1.attn2.to_v.weight", "up_blocks.1.attentions.0.transformer_blocks.1.ff.net.0.proj.bias", "up_blocks.1.attentions.0.transformer_blocks.1.ff.net.0.proj.weight", "up_blocks.1.attentions.0.transformer_blocks.1.ff.net.2.bias", "up_blocks.1.attentions.0.transformer_blocks.1.ff.net.2.weight", "up_blocks.1.attentions.0.transformer_blocks.1.norm1.bias", "up_blocks.1.attentions.0.transformer_blocks.1.norm1.weight", "up_blocks.1.attentions.0.transformer_blocks.1.norm2.bias", "up_blocks.1.attentions.0.transformer_blocks.1.norm2.weight", "up_blocks.1.attentions.0.transformer_blocks.1.norm3.bias", "up_blocks.1.attentions.0.transformer_blocks.1.norm3.weight", "up_blocks.1.attentions.1.transformer_blocks.1.attn1.to_k.weight", "up_blocks.1.attentions.1.transformer_blocks.1.attn1.to_out.0.bias", "up_blocks.1.attentions.1.transformer_blocks.1.attn1.to_out.0.weight", "up_blocks.1.attentions.1.transformer_blocks.1.attn1.to_q.weight", "up_blocks.1.attentions.1.transformer_blocks.1.attn1.to_v.weight", "up_blocks.1.attentions.1.transformer_blocks.1.attn2.to_k.weight", "up_blocks.1.attentions.1.transformer_blocks.1.attn2.to_out.0.bias", "up_blocks.1.attentions.1.transformer_blocks.1.attn2.to_out.0.weight", "up_blocks.1.attentions.1.transformer_blocks.1.attn2.to_q.weight", "up_blocks.1.attentions.1.transformer_blocks.1.attn2.to_v.weight", "up_blocks.1.attentions.1.transformer_blocks.1.ff.net.0.proj.bias", "up_blocks.1.attentions.1.transformer_blocks.1.ff.net.0.proj.weight", "up_blocks.1.attentions.1.transformer_blocks.1.ff.net.2.bias", "up_blocks.1.attentions.1.transformer_blocks.1.ff.net.2.weight", "up_blocks.1.attentions.1.transformer_blocks.1.norm1.bias", "up_blocks.1.attentions.1.transformer_blocks.1.norm1.weight", "up_blocks.1.attentions.1.transformer_blocks.1.norm2.bias", "up_blocks.1.attentions.1.transformer_blocks.1.norm2.weight", "up_blocks.1.attentions.1.transformer_blocks.1.norm3.bias", "up_blocks.1.attentions.1.transformer_blocks.1.norm3.weight", "up_blocks.1.attentions.2.transformer_blocks.1.attn1.to_k.weight", "up_blocks.1.attentions.2.transformer_blocks.1.attn1.to_out.0.bias", "up_blocks.1.attentions.2.transformer_blocks.1.attn1.to_out.0.weight", "up_blocks.1.attentions.2.transformer_blocks.1.attn1.to_q.weight", "up_blocks.1.attentions.2.transformer_blocks.1.attn1.to_v.weight", "up_blocks.1.attentions.2.transformer_blocks.1.attn2.to_k.weight", "up_blocks.1.attentions.2.transformer_blocks.1.attn2.to_out.0.bias", "up_blocks.1.attentions.2.transformer_blocks.1.attn2.to_out.0.weight", "up_blocks.1.attentions.2.transformer_blocks.1.attn2.to_q.weight", "up_blocks.1.attentions.2.transformer_blocks.1.attn2.to_v.weight", "up_blocks.1.attentions.2.transformer_blocks.1.ff.net.0.proj.bias", "up_blocks.1.attentions.2.transformer_blocks.1.ff.net.0.proj.weight", "up_blocks.1.attentions.2.transformer_blocks.1.ff.net.2.bias", "up_blocks.1.attentions.2.transformer_blocks.1.ff.net.2.weight", "up_blocks.1.attentions.2.transformer_blocks.1.norm1.bias", "up_blocks.1.attentions.2.transformer_blocks.1.norm1.weight", "up_blocks.1.attentions.2.transformer_blocks.1.norm2.bias", "up_blocks.1.attentions.2.transformer_blocks.1.norm2.weight", "up_blocks.1.attentions.2.transformer_blocks.1.norm3.bias", "up_blocks.1.attentions.2.transformer_blocks.1.norm3.weight", "mid_block.attentions.0.transformer_blocks.1.attn1.to_k.weight", "mid_block.attentions.0.transformer_blocks.1.attn1.to_out.0.bias", "mid_block.attentions.0.transformer_blocks.1.attn1.to_out.0.weight", "mid_block.attentions.0.transformer_blocks.1.attn1.to_q.weight", "mid_block.attentions.0.transformer_blocks.1.attn1.to_v.weight", "mid_block.attentions.0.transformer_blocks.1.attn2.to_k.weight", "mid_block.attentions.0.transformer_blocks.1.attn2.to_out.0.bias", "mid_block.attentions.0.transformer_blocks.1.attn2.to_out.0.weight", "mid_block.attentions.0.transformer_blocks.1.attn2.to_q.weight", "mid_block.attentions.0.transformer_blocks.1.attn2.to_v.weight", "mid_block.attentions.0.transformer_blocks.1.ff.net.0.proj.bias", "mid_block.attentions.0.transformer_blocks.1.ff.net.0.proj.weight", "mid_block.attentions.0.transformer_blocks.1.ff.net.2.bias", "mid_block.attentions.0.transformer_blocks.1.ff.net.2.weight", "mid_block.attentions.0.transformer_blocks.1.norm1.bias", "mid_block.attentions.0.transformer_blocks.1.norm1.weight", "mid_block.attentions.0.transformer_blocks.1.norm2.bias", "mid_block.attentions.0.transformer_blocks.1.norm2.weight", "mid_block.attentions.0.transformer_blocks.1.norm3.bias", "mid_block.attentions.0.transformer_blocks.1.norm3.weight", "mid_block.attentions.0.transformer_blocks.2.attn1.to_k.weight", "mid_block.attentions.0.transformer_blocks.2.attn1.to_out.0.bias", "mid_block.attentions.0.transformer_blocks.2.attn1.to_out.0.weight", "mid_block.attentions.0.transformer_blocks.2.attn1.to_q.weight", "mid_block.attentions.0.transformer_blocks.2.attn1.to_v.weight", "mid_block.attentions.0.transformer_blocks.2.attn2.to_k.weight", "mid_block.attentions.0.transformer_blocks.2.attn2.to_out.0.bias", "mid_block.attentions.0.transformer_blocks.2.attn2.to_out.0.weight", "mid_block.attentions.0.transformer_blocks.2.attn2.to_q.weight", "mid_block.attentions.0.transformer_blocks.2.attn2.to_v.weight", "mid_block.attentions.0.transformer_blocks.2.ff.net.0.proj.bias", "mid_block.attentions.0.transformer_blocks.2.ff.net.0.proj.weight", "mid_block.attentions.0.transformer_blocks.2.ff.net.2.bias", "mid_block.attentions.0.transformer_blocks.2.ff.net.2.weight", "mid_block.attentions.0.transformer_blocks.2.norm1.bias", "mid_block.attentions.0.transformer_blocks.2.norm1.weight", "mid_block.attentions.0.transformer_blocks.2.norm2.bias", "mid_block.attentions.0.transformer_blocks.2.norm2.weight", "mid_block.attentions.0.transformer_blocks.2.norm3.bias", "mid_block.attentions.0.transformer_blocks.2.norm3.weight", "mid_block.attentions.0.transformer_blocks.3.attn1.to_k.weight", "mid_block.attentions.0.transformer_blocks.3.attn1.to_out.0.bias", "mid_block.attentions.0.transformer_blocks.3.attn1.to_out.0.weight", "mid_block.attentions.0.transformer_blocks.3.attn1.to_q.weight", "mid_block.attentions.0.transformer_blocks.3.attn1.to_v.weight", "mid_block.attentions.0.transformer_blocks.3.attn2.to_k.weight", "mid_block.attentions.0.transformer_blocks.3.attn2.to_out.0.bias", "mid_block.attentions.0.transformer_blocks.3.attn2.to_out.0.weight", "mid_block.attentions.0.transformer_blocks.3.attn2.to_q.weight", "mid_block.attentions.0.transformer_blocks.3.attn2.to_v.weight", "mid_block.attentions.0.transformer_blocks.3.ff.net.0.proj.bias", "mid_block.attentions.0.transformer_blocks.3.ff.net.0.proj.weight", "mid_block.attentions.0.transformer_blocks.3.ff.net.2.bias", "mid_block.attentions.0.transformer_blocks.3.ff.net.2.weight", "mid_block.attentions.0.transformer_blocks.3.norm1.bias", "mid_block.attentions.0.transformer_blocks.3.norm1.weight", "mid_block.attentions.0.transformer_blocks.3.norm2.bias", "mid_block.attentions.0.transformer_blocks.3.norm2.weight", "mid_block.attentions.0.transformer_blocks.3.norm3.bias", "mid_block.attentions.0.transformer_blocks.3.norm3.weight", "mid_block.attentions.0.transformer_blocks.4.attn1.to_k.weight", "mid_block.attentions.0.transformer_blocks.4.attn1.to_out.0.bias", "mid_block.attentions.0.transformer_blocks.4.attn1.to_out.0.weight", "mid_block.attentions.0.transformer_blocks.4.attn1.to_q.weight", "mid_block.attentions.0.transformer_blocks.4.attn1.to_v.weight", "mid_block.attentions.0.transformer_blocks.4.attn2.to_k.weight", "mid_block.attentions.0.transformer_blocks.4.attn2.to_out.0.bias", "mid_block.attentions.0.transformer_blocks.4.attn2.to_out.0.weight", "mid_block.attentions.0.transformer_blocks.4.attn2.to_q.weight", "mid_block.attentions.0.transformer_blocks.4.attn2.to_v.weight", "mid_block.attentions.0.transformer_blocks.4.ff.net.0.proj.bias", "mid_block.attentions.0.transformer_blocks.4.ff.net.0.proj.weight", "mid_block.attentions.0.transformer_blocks.4.ff.net.2.bias", "mid_block.attentions.0.transformer_blocks.4.ff.net.2.weight", "mid_block.attentions.0.transformer_blocks.4.norm1.bias", "mid_block.attentions.0.transformer_blocks.4.norm1.weight", "mid_block.attentions.0.transformer_blocks.4.norm2.bias", "mid_block.attentions.0.transformer_blocks.4.norm2.weight", "mid_block.attentions.0.transformer_blocks.4.norm3.bias", "mid_block.attentions.0.transformer_blocks.4.norm3.weight", "mid_block.attentions.0.transformer_blocks.5.attn1.to_k.weight", "mid_block.attentions.0.transformer_blocks.5.attn1.to_out.0.bias", "mid_block.attentions.0.transformer_blocks.5.attn1.to_out.0.weight", "mid_block.attentions.0.transformer_blocks.5.attn1.to_q.weight", "mid_block.attentions.0.transformer_blocks.5.attn1.to_v.weight", "mid_block.attentions.0.transformer_blocks.5.attn2.to_k.weight", "mid_block.attentions.0.transformer_blocks.5.attn2.to_out.0.bias", "mid_block.attentions.0.transformer_blocks.5.attn2.to_out.0.weight", "mid_block.attentions.0.transformer_blocks.5.attn2.to_q.weight", "mid_block.attentions.0.transformer_blocks.5.attn2.to_v.weight", "mid_block.attentions.0.transformer_blocks.5.ff.net.0.proj.bias", "mid_block.attentions.0.transformer_blocks.5.ff.net.0.proj.weight", "mid_block.attentions.0.transformer_blocks.5.ff.net.2.bias", "mid_block.attentions.0.transformer_blocks.5.ff.net.2.weight", "mid_block.attentions.0.transformer_blocks.5.norm1.bias", "mid_block.attentions.0.transformer_blocks.5.norm1.weight", "mid_block.attentions.0.transformer_blocks.5.norm2.bias", "mid_block.attentions.0.transformer_blocks.5.norm2.weight", "mid_block.attentions.0.transformer_blocks.5.norm3.bias", "mid_block.attentions.0.transformer_blocks.5.norm3.weight", "mid_block.attentions.0.transformer_blocks.6.attn1.to_k.weight", "mid_block.attentions.0.transformer_blocks.6.attn1.to_out.0.bias", "mid_block.attentions.0.transformer_blocks.6.attn1.to_out.0.weight", "mid_block.attentions.0.transformer_blocks.6.attn1.to_q.weight", "mid_block.attentions.0.transformer_blocks.6.attn1.to_v.weight", "mid_block.attentions.0.transformer_blocks.6.attn2.to_k.weight", "mid_block.attentions.0.transformer_blocks.6.attn2.to_out.0.bias", "mid_block.attentions.0.transformer_blocks.6.attn2.to_out.0.weight", "mid_block.attentions.0.transformer_blocks.6.attn2.to_q.weight", "mid_block.attentions.0.transformer_blocks.6.attn2.to_v.weight", "mid_block.attentions.0.transformer_blocks.6.ff.net.0.proj.bias", "mid_block.attentions.0.transformer_blocks.6.ff.net.0.proj.weight", "mid_block.attentions.0.transformer_blocks.6.ff.net.2.bias", "mid_block.attentions.0.transformer_blocks.6.ff.net.2.weight", "mid_block.attentions.0.transformer_blocks.6.norm1.bias", "mid_block.attentions.0.transformer_blocks.6.norm1.weight", "mid_block.attentions.0.transformer_blocks.6.norm2.bias", "mid_block.attentions.0.transformer_blocks.6.norm2.weight", "mid_block.attentions.0.transformer_blocks.6.norm3.bias", "mid_block.attentions.0.transformer_blocks.6.norm3.weight", "mid_block.attentions.0.transformer_blocks.7.attn1.to_k.weight", "mid_block.attentions.0.transformer_blocks.7.attn1.to_out.0.bias", "mid_block.attentions.0.transformer_blocks.7.attn1.to_out.0.weight", "mid_block.attentions.0.transformer_blocks.7.attn1.to_q.weight", "mid_block.attentions.0.transformer_blocks.7.attn1.to_v.weight", "mid_block.attentions.0.transformer_blocks.7.attn2.to_k.weight", "mid_block.attentions.0.transformer_blocks.7.attn2.to_out.0.bias", "mid_block.attentions.0.transformer_blocks.7.attn2.to_out.0.weight", "mid_block.attentions.0.transformer_blocks.7.attn2.to_q.weight", "mid_block.attentions.0.transformer_blocks.7.attn2.to_v.weight", "mid_block.attentions.0.transformer_blocks.7.ff.net.0.proj.bias", "mid_block.attentions.0.transformer_blocks.7.ff.net.0.proj.weight", "mid_block.attentions.0.transformer_blocks.7.ff.net.2.bias", "mid_block.attentions.0.transformer_blocks.7.ff.net.2.weight", "mid_block.attentions.0.transformer_blocks.7.norm1.bias", "mid_block.attentions.0.transformer_blocks.7.norm1.weight", "mid_block.attentions.0.transformer_blocks.7.norm2.bias", "mid_block.attentions.0.transformer_blocks.7.norm2.weight", "mid_block.attentions.0.transformer_blocks.7.norm3.bias", "mid_block.attentions.0.transformer_blocks.7.norm3.weight", "mid_block.attentions.0.transformer_blocks.8.attn1.to_k.weight", "mid_block.attentions.0.transformer_blocks.8.attn1.to_out.0.bias", "mid_block.attentions.0.transformer_blocks.8.attn1.to_out.0.weight", "mid_block.attentions.0.transformer_blocks.8.attn1.to_q.weight", "mid_block.attentions.0.transformer_blocks.8.attn1.to_v.weight", "mid_block.attentions.0.transformer_blocks.8.attn2.to_k.weight", "mid_block.attentions.0.transformer_blocks.8.attn2.to_out.0.bias", "mid_block.attentions.0.transformer_blocks.8.attn2.to_out.0.weight", "mid_block.attentions.0.transformer_blocks.8.attn2.to_q.weight", "mid_block.attentions.0.transformer_blocks.8.attn2.to_v.weight", "mid_block.attentions.0.transformer_blocks.8.ff.net.0.proj.bias", "mid_block.attentions.0.transformer_blocks.8.ff.net.0.proj.weight", "mid_block.attentions.0.transformer_blocks.8.ff.net.2.bias", "mid_block.attentions.0.transformer_blocks.8.ff.net.2.weight", "mid_block.attentions.0.transformer_blocks.8.norm1.bias", "mid_block.attentions.0.transformer_blocks.8.norm1.weight", "mid_block.attentions.0.transformer_blocks.8.norm2.bias", "mid_block.attentions.0.transformer_blocks.8.norm2.weight", "mid_block.attentions.0.transformer_blocks.8.norm3.bias", "mid_block.attentions.0.transformer_blocks.8.norm3.weight", "mid_block.attentions.0.transformer_blocks.9.attn1.to_k.weight", "mid_block.attentions.0.transformer_blocks.9.attn1.to_out.0.bias", "mid_block.attentions.0.transformer_blocks.9.attn1.to_out.0.weight", "mid_block.attentions.0.transformer_blocks.9.attn1.to_q.weight", "mid_block.attentions.0.transformer_blocks.9.attn1.to_v.weight", "mid_block.attentions.0.transformer_blocks.9.attn2.to_k.weight", "mid_block.attentions.0.transformer_blocks.9.attn2.to_out.0.bias", "mid_block.attentions.0.transformer_blocks.9.attn2.to_out.0.weight", "mid_block.attentions.0.transformer_blocks.9.attn2.to_q.weight", "mid_block.attentions.0.transformer_blocks.9.attn2.to_v.weight", "mid_block.attentions.0.transformer_blocks.9.ff.net.0.proj.bias", "mid_block.attentions.0.transformer_blocks.9.ff.net.0.proj.weight", "mid_block.attentions.0.transformer_blocks.9.ff.net.2.bias", "mid_block.attentions.0.transformer_blocks.9.ff.net.2.weight", "mid_block.attentions.0.transformer_blocks.9.norm1.bias", "mid_block.attentions.0.transformer_blocks.9.norm1.weight", "mid_block.attentions.0.transformer_blocks.9.norm2.bias", "mid_block.attentions.0.transformer_blocks.9.norm2.weight", "mid_block.attentions.0.transformer_blocks.9.norm3.bias", "mid_block.attentions.0.transformer_blocks.9.norm3.weight". size mismatch for down_blocks.1.attentions.0.proj_in.weight: copying a param with shape torch.Size([640, 640]) from checkpoint, the shape in current model is torch.Size([640, 640, 1, 1]). size mismatch for down_blocks.1.attentions.0.transformer_blocks.0.attn2.to_k.weight: copying a param with shape torch.Size([640, 2048]) from checkpoint, the shape in current model is torch.Size([640, 768]). size mismatch for down_blocks.1.attentions.0.transformer_blocks.0.attn2.to_v.weight: copying a param with shape torch.Size([640, 2048]) from checkpoint, the shape in current model is torch.Size([640, 768]). size mismatch for down_blocks.1.attentions.0.proj_out.weight: copying a param with shape torch.Size([640, 640]) from checkpoint, the shape in current model is torch.Size([640, 640, 1, 1]). size mismatch for down_blocks.1.attentions.1.proj_in.weight: copying a param with shape torch.Size([640, 640]) from checkpoint, the shape in current model is torch.Size([640, 640, 1, 1]). size mismatch for down_blocks.1.attentions.1.transformer_blocks.0.attn2.to_k.weight: copying a param with shape torch.Size([640, 2048]) from checkpoint, the shape in current model is torch.Size([640, 768]). size mismatch for down_blocks.1.attentions.1.transformer_blocks.0.attn2.to_v.weight: copying a param with shape torch.Size([640, 2048]) from checkpoint, the shape in current model is torch.Size([640, 768]). size mismatch for down_blocks.1.attentions.1.proj_out.weight: copying a param with shape torch.Size([640, 640]) from checkpoint, the shape in current model is torch.Size([640, 640, 1, 1]). size mismatch for down_blocks.2.attentions.0.proj_in.weight: copying a param with shape torch.Size([1280, 1280]) from checkpoint, the shape in current model is torch.Size([1280, 1280, 1, 1]). size mismatch for down_blocks.2.attentions.0.transformer_blocks.0.attn2.to_k.weight: copying a param with shape torch.Size([1280, 2048]) from checkpoint, the shape in current model is torch.Size([1280, 768]). size mismatch for down_blocks.2.attentions.0.transformer_blocks.0.attn2.to_v.weight: copying a param with shape torch.Size([1280, 2048]) from checkpoint, the shape in current model is torch.Size([1280, 768]). size mismatch for down_blocks.2.attentions.0.proj_out.weight: copying a param with shape torch.Size([1280, 1280]) from checkpoint, the shape in current model is torch.Size([1280, 1280, 1, 1]). size mismatch for down_blocks.2.attentions.1.proj_in.weight: copying a param with shape torch.Size([1280, 1280]) from checkpoint, the shape in current model is torch.Size([1280, 1280, 1, 1]). size mismatch for down_blocks.2.attentions.1.transformer_blocks.0.attn2.to_k.weight: copying a param with shape torch.Size([1280, 2048]) from checkpoint, the shape in current model is torch.Size([1280, 768]). size mismatch for down_blocks.2.attentions.1.transformer_blocks.0.attn2.to_v.weight: copying a param with shape torch.Size([1280, 2048]) from checkpoint, the shape in current model is torch.Size([1280, 768]). size mismatch for down_blocks.2.attentions.1.proj_out.weight: copying a param with shape torch.Size([1280, 1280]) from checkpoint, the shape in current model is torch.Size([1280, 1280, 1, 1]). size mismatch for up_blocks.0.resnets.2.norm1.weight: copying a param with shape torch.Size([1920]) from checkpoint, the shape in current model is torch.Size([2560]). size mismatch for up_blocks.0.resnets.2.norm1.bias: copying a param with shape torch.Size([1920]) from checkpoint, the shape in current model is torch.Size([2560]). size mismatch for up_blocks.0.resnets.2.conv1.weight: copying a param with shape torch.Size([1280, 1920, 3, 3]) from checkpoint, the shape in current model is torch.Size([1280, 2560, 3, 3]). size mismatch for up_blocks.0.resnets.2.conv_shortcut.weight: copying a param with shape torch.Size([1280, 1920, 1, 1]) from checkpoint, the shape in current model is torch.Size([1280, 2560, 1, 1]). size mismatch for up_blocks.1.attentions.0.norm.weight: copying a param with shape torch.Size([640]) from checkpoint, the shape in current model is torch.Size([1280]). size mismatch for up_blocks.1.attentions.0.norm.bias: copying a param with shape torch.Size([640]) from checkpoint, the shape in current model is torch.Size([1280]). size mismatch for up_blocks.1.attentions.0.proj_in.weight: copying a param with shape torch.Size([640, 640]) from checkpoint, the shape in current model is torch.Size([1280, 1280, 1, 1]). size mismatch for up_blocks.1.attentions.0.proj_in.bias: copying a param with shape torch.Size([640]) from checkpoint, the shape in current model is torch.Size([1280]). size mismatch for up_blocks.1.attentions.0.transformer_blocks.0.attn1.to_q.weight: copying a param with shape torch.Size([640, 640]) from checkpoint, the shape in current model is torch.Size([1280, 1280]). size mismatch for up_blocks.1.attentions.0.transformer_blocks.0.attn1.to_k.weight: copying a param with shape torch.Size([640, 640]) from checkpoint, the shape in current model is torch.Size([1280, 1280]). size mismatch for up_blocks.1.attentions.0.transformer_blocks.0.attn1.to_v.weight: copying a param with shape torch.Size([640, 640]) from checkpoint, the shape in current model is torch.Size([1280, 1280]). size mismatch for up_blocks.1.attentions.0.transformer_blocks.0.attn1.to_out.0.weight: copying a param with shape torch.Size([640, 640]) from checkpoint, the shape in current model is torch.Size([1280, 1280]). size mismatch for up_blocks.1.attentions.0.transformer_blocks.0.attn1.to_out.0.bias: copying a param with shape torch.Size([640]) from checkpoint, the shape in current model is torch.Size([1280]). size mismatch for up_blocks.1.attentions.0.transformer_blocks.0.ff.net.0.proj.weight: copying a param with shape torch.Size([5120, 640]) from checkpoint, the shape in current model is torch.Size([10240, 1280]). size mismatch for up_blocks.1.attentions.0.transformer_blocks.0.ff.net.0.proj.bias: copying a param with shape torch.Size([5120]) from checkpoint, the shape in current model is torch.Size([10240]). size mismatch for up_blocks.1.attentions.0.transformer_blocks.0.ff.net.2.weight: copying a param with shape torch.Size([640, 2560]) from checkpoint, the shape in current model is torch.Size([1280, 5120]). size mismatch for up_blocks.1.attentions.0.transformer_blocks.0.ff.net.2.bias: copying a param with shape torch.Size([640]) from checkpoint, the shape in current model is torch.Size([1280]). size mismatch for up_blocks.1.attentions.0.transformer_blocks.0.attn2.to_q.weight: copying a param with shape torch.Size([640, 640]) from checkpoint, the shape in current model is torch.Size([1280, 1280]). size mismatch for up_blocks.1.attentions.0.transformer_blocks.0.attn2.to_k.weight: copying a param with shape torch.Size([640, 2048]) from checkpoint, the shape in current model is torch.Size([1280, 768]). size mismatch for up_blocks.1.attentions.0.transformer_blocks.0.attn2.to_v.weight: copying a param with shape torch.Size([640, 2048]) from checkpoint, the shape in current model is torch.Size([1280, 768]). size mismatch for up_blocks.1.attentions.0.transformer_blocks.0.attn2.to_out.0.weight: copying a param with shape torch.Size([640, 640]) from checkpoint, the shape in current model is torch.Size([1280, 1280]). size mismatch for up_blocks.1.attentions.0.transformer_blocks.0.attn2.to_out.0.bias: copying a param with shape torch.Size([640]) from checkpoint, the shape in current model is torch.Size([1280]). size mismatch for up_blocks.1.attentions.0.transformer_blocks.0.norm1.weight: copying a param with shape torch.Size([640]) from checkpoint, the shape in current model is torch.Size([1280]). size mismatch for up_blocks.1.attentions.0.transformer_blocks.0.norm1.bias: copying a param with shape torch.Size([640]) from checkpoint, the shape in current model is torch.Size([1280]). size mismatch for up_blocks.1.attentions.0.transformer_blocks.0.norm2.weight: copying a param with shape torch.Size([640]) from checkpoint, the shape in current model is torch.Size([1280]). size mismatch for up_blocks.1.attentions.0.transformer_blocks.0.norm2.bias: copying a param with shape torch.Size([640]) from checkpoint, the shape in current model is torch.Size([1280]). size mismatch for up_blocks.1.attentions.0.transformer_blocks.0.norm3.weight: copying a param with shape torch.Size([640]) from checkpoint, the shape in current model is torch.Size([1280]). size mismatch for up_blocks.1.attentions.0.transformer_blocks.0.norm3.bias: copying a param with shape torch.Size([640]) from checkpoint, the shape in current model is torch.Size([1280]). size mismatch for up_blocks.1.attentions.0.proj_out.weight: copying a param with shape torch.Size([640, 640]) from checkpoint, the shape in current model is torch.Size([1280, 1280, 1, 1]). size mismatch for up_blocks.1.attentions.0.proj_out.bias: copying a param with shape torch.Size([640]) from checkpoint, the shape in current model is torch.Size([1280]). size mismatch for up_blocks.1.attentions.1.norm.weight: copying a param with shape torch.Size([640]) from checkpoint, the shape in current model is torch.Size([1280]). size mismatch for up_blocks.1.attentions.1.norm.bias: copying a param with shape torch.Size([640]) from checkpoint, the shape in current model is torch.Size([1280]). size mismatch for up_blocks.1.attentions.1.proj_in.weight: copying a param with shape torch.Size([640, 640]) from checkpoint, the shape in current model is torch.Size([1280, 1280, 1, 1]). size mismatch for up_blocks.1.attentions.1.proj_in.bias: copying a param with shape torch.Size([640]) from checkpoint, the shape in current model is torch.Size([1280]). size mismatch for up_blocks.1.attentions.1.transformer_blocks.0.attn1.to_q.weight: copying a param with shape torch.Size([640, 640]) from checkpoint, the shape in current model is torch.Size([1280, 1280]). size mismatch for up_blocks.1.attentions.1.transformer_blocks.0.attn1.to_k.weight: copying a param with shape torch.Size([640, 640]) from checkpoint, the shape in current model is torch.Size([1280, 1280]). size mismatch for up_blocks.1.attentions.1.transformer_blocks.0.attn1.to_v.weight: copying a param with shape torch.Size([640, 640]) from checkpoint, the shape in current model is torch.Size([1280, 1280]). size mismatch for up_blocks.1.attentions.1.transformer_blocks.0.attn1.to_out.0.weight: copying a param with shape torch.Size([640, 640]) from checkpoint, the shape in current model is torch.Size([1280, 1280]). size mismatch for up_blocks.1.attentions.1.transformer_blocks.0.attn1.to_out.0.bias: copying a param with shape torch.Size([640]) from checkpoint, the shape in current model is torch.Size([1280]). size mismatch for up_blocks.1.attentions.1.transformer_blocks.0.ff.net.0.proj.weight: copying a param with shape torch.Size([5120, 640]) from checkpoint, the shape in current model is torch.Size([10240, 1280]). size mismatch for up_blocks.1.attentions.1.transformer_blocks.0.ff.net.0.proj.bias: copying a param with shape torch.Size([5120]) from checkpoint, the shape in current model is torch.Size([10240]). size mismatch for up_blocks.1.attentions.1.transformer_blocks.0.ff.net.2.weight: copying a param with shape torch.Size([640, 2560]) from checkpoint, the shape in current model is torch.Size([1280, 5120]). size mismatch for up_blocks.1.attentions.1.transformer_blocks.0.ff.net.2.bias: copying a param with shape torch.Size([640]) from checkpoint, the shape in current model is torch.Size([1280]). size mismatch for up_blocks.1.attentions.1.transformer_blocks.0.attn2.to_q.weight: copying a param with shape torch.Size([640, 640]) from checkpoint, the shape in current model is torch.Size([1280, 1280]). size mismatch for up_blocks.1.attentions.1.transformer_blocks.0.attn2.to_k.weight: copying a param with shape torch.Size([640, 2048]) from checkpoint, the shape in current model is torch.Size([1280, 768]). size mismatch for up_blocks.1.attentions.1.transformer_blocks.0.attn2.to_v.weight: copying a param with shape torch.Size([640, 2048]) from checkpoint, the shape in current model is torch.Size([1280, 768]). size mismatch for up_blocks.1.attentions.1.transformer_blocks.0.attn2.to_out.0.weight: copying a param with shape torch.Size([640, 640]) from checkpoint, the shape in current model is torch.Size([1280, 1280]). size mismatch for up_blocks.1.attentions.1.transformer_blocks.0.attn2.to_out.0.bias: copying a param with shape torch.Size([640]) from checkpoint, the shape in current model is torch.Size([1280]). size mismatch for up_blocks.1.attentions.1.transformer_blocks.0.norm1.weight: copying a param with shape torch.Size([640]) from checkpoint, the shape in current model is torch.Size([1280]). size mismatch for up_blocks.1.attentions.1.transformer_blocks.0.norm1.bias: copying a param with shape torch.Size([640]) from checkpoint, the shape in current model is torch.Size([1280]). size mismatch for up_blocks.1.attentions.1.transformer_blocks.0.norm2.weight: copying a param with shape torch.Size([640]) from checkpoint, the shape in current model is torch.Size([1280]). size mismatch for up_blocks.1.attentions.1.transformer_blocks.0.norm2.bias: copying a param with shape torch.Size([640]) from checkpoint, the shape in current model is torch.Size([1280]). size mismatch for up_blocks.1.attentions.1.transformer_blocks.0.norm3.weight: copying a param with shape torch.Size([640]) from checkpoint, the shape in current model is torch.Size([1280]). size mismatch for up_blocks.1.attentions.1.transformer_blocks.0.norm3.bias: copying a param with shape torch.Size([640]) from checkpoint, the shape in current model is torch.Size([1280]). size mismatch for up_blocks.1.attentions.1.proj_out.weight: copying a param with shape torch.Size([640, 640]) from checkpoint, the shape in current model is torch.Size([1280, 1280, 1, 1]). size mismatch for up_blocks.1.attentions.1.proj_out.bias: copying a param with shape torch.Size([640]) from checkpoint, the shape in current model is torch.Size([1280]). size mismatch for up_blocks.1.attentions.2.norm.weight: copying a param with shape torch.Size([640]) from checkpoint, the shape in current model is torch.Size([1280]). size mismatch for up_blocks.1.attentions.2.norm.bias: copying a param with shape torch.Size([640]) from checkpoint, the shape in current model is torch.Size([1280]). size mismatch for up_blocks.1.attentions.2.proj_in.weight: copying a param with shape torch.Size([640, 640]) from checkpoint, the shape in current model is torch.Size([1280, 1280, 1, 1]). size mismatch for up_blocks.1.attentions.2.proj_in.bias: copying a param with shape torch.Size([640]) from checkpoint, the shape in current model is torch.Size([1280]). size mismatch for up_blocks.1.attentions.2.transformer_blocks.0.attn1.to_q.weight: copying a param with shape torch.Size([640, 640]) from checkpoint, the shape in current model is torch.Size([1280, 1280]). size mismatch for up_blocks.1.attentions.2.transformer_blocks.0.attn1.to_k.weight: copying a param with shape torch.Size([640, 640]) from checkpoint, the shape in current model is torch.Size([1280, 1280]). size mismatch for up_blocks.1.attentions.2.transformer_blocks.0.attn1.to_v.weight: copying a param with shape torch.Size([640, 640]) from checkpoint, the shape in current model is torch.Size([1280, 1280]). size mismatch for up_blocks.1.attentions.2.transformer_blocks.0.attn1.to_out.0.weight: copying a param with shape torch.Size([640, 640]) from checkpoint, the shape in current model is torch.Size([1280, 1280]). size mismatch for up_blocks.1.attentions.2.transformer_blocks.0.attn1.to_out.0.bias: copying a param with shape torch.Size([640]) from checkpoint, the shape in current model is torch.Size([1280]). size mismatch for up_blocks.1.attentions.2.transformer_blocks.0.ff.net.0.proj.weight: copying a param with shape torch.Size([5120, 640]) from checkpoint, the shape in current model is torch.Size([10240, 1280]). size mismatch for up_blocks.1.attentions.2.transformer_blocks.0.ff.net.0.proj.bias: copying a param with shape torch.Size([5120]) from checkpoint, the shape in current model is torch.Size([10240]). size mismatch for up_blocks.1.attentions.2.transformer_blocks.0.ff.net.2.weight: copying a param with shape torch.Size([640, 2560]) from checkpoint, the shape in current model is torch.Size([1280, 5120]). size mismatch for up_blocks.1.attentions.2.transformer_blocks.0.ff.net.2.bias: copying a param with shape torch.Size([640]) from checkpoint, the shape in current model is torch.Size([1280]). size mismatch for up_blocks.1.attentions.2.transformer_blocks.0.attn2.to_q.weight: copying a param with shape torch.Size([640, 640]) from checkpoint, the shape in current model is torch.Size([1280, 1280]). size mismatch for up_blocks.1.attentions.2.transformer_blocks.0.attn2.to_k.weight: copying a param with shape torch.Size([640, 2048]) from checkpoint, the shape in current model is torch.Size([1280, 768]). size mismatch for up_blocks.1.attentions.2.transformer_blocks.0.attn2.to_v.weight: copying a param with shape torch.Size([640, 2048]) from checkpoint, the shape in current model is torch.Size([1280, 768]). size mismatch for up_blocks.1.attentions.2.transformer_blocks.0.attn2.to_out.0.weight: copying a param with shape torch.Size([640, 640]) from checkpoint, the shape in current model is torch.Size([1280, 1280]). size mismatch for up_blocks.1.attentions.2.transformer_blocks.0.attn2.to_out.0.bias: copying a param with shape torch.Size([640]) from checkpoint, the shape in current model is torch.Size([1280]). size mismatch for up_blocks.1.attentions.2.transformer_blocks.0.norm1.weight: copying a param with shape torch.Size([640]) from checkpoint, the shape in current model is torch.Size([1280]). size mismatch for up_blocks.1.attentions.2.transformer_blocks.0.norm1.bias: copying a param with shape torch.Size([640]) from checkpoint, the shape in current model is torch.Size([1280]). size mismatch for up_blocks.1.attentions.2.transformer_blocks.0.norm2.weight: copying a param with shape torch.Size([640]) from checkpoint, the shape in current model is torch.Size([1280]). size mismatch for up_blocks.1.attentions.2.transformer_blocks.0.norm2.bias: copying a param with shape torch.Size([640]) from checkpoint, the shape in current model is torch.Size([1280]). size mismatch for up_blocks.1.attentions.2.transformer_blocks.0.norm3.weight: copying a param with shape torch.Size([640]) from checkpoint, the shape in current model is torch.Size([1280]). size mismatch for up_blocks.1.attentions.2.transformer_blocks.0.norm3.bias: copying a param with shape torch.Size([640]) from checkpoint, the shape in current model is torch.Size([1280]). size mismatch for up_blocks.1.attentions.2.proj_out.weight: copying a param with shape torch.Size([640, 640]) from checkpoint, the shape in current model is torch.Size([1280, 1280, 1, 1]). size mismatch for up_blocks.1.attentions.2.proj_out.bias: copying a param with shape torch.Size([640]) from checkpoint, the shape in current model is torch.Size([1280]). size mismatch for up_blocks.1.resnets.0.norm1.weight: copying a param with shape torch.Size([1920]) from checkpoint, the shape in current model is torch.Size([2560]). size mismatch for up_blocks.1.resnets.0.norm1.bias: copying a param with shape torch.Size([1920]) from checkpoint, the shape in current model is torch.Size([2560]). size mismatch for up_blocks.1.resnets.0.conv1.weight: copying a param with shape torch.Size([640, 1920, 3, 3]) from checkpoint, the shape in current model is torch.Size([1280, 2560, 3, 3]). size mismatch for up_blocks.1.resnets.0.conv1.bias: copying a param with shape torch.Size([640]) from checkpoint, the shape in current model is torch.Size([1280]). size mismatch for up_blocks.1.resnets.0.time_emb_proj.weight: copying a param with shape torch.Size([640, 1280]) from checkpoint, the shape in current model is torch.Size([1280, 1280]). size mismatch for up_blocks.1.resnets.0.time_emb_proj.bias: copying a param with shape torch.Size([640]) from checkpoint, the shape in current model is torch.Size([1280]). size mismatch for up_blocks.1.resnets.0.norm2.weight: copying a param with shape torch.Size([640]) from checkpoint, the shape in current model is torch.Size([1280]). size mismatch for up_blocks.1.resnets.0.norm2.bias: copying a param with shape torch.Size([640]) from checkpoint, the shape in current model is torch.Size([1280]). size mismatch for up_blocks.1.resnets.0.conv2.weight: copying a param with shape torch.Size([640, 640, 3, 3]) from checkpoint, the shape in current model is torch.Size([1280, 1280, 3, 3]). size mismatch for up_blocks.1.resnets.0.conv2.bias: copying a param with shape torch.Size([640]) from checkpoint, the shape in current model is torch.Size([1280]). size mismatch for up_blocks.1.resnets.0.conv_shortcut.weight: copying a param with shape torch.Size([640, 1920, 1, 1]) from checkpoint, the shape in current model is torch.Size([1280, 2560, 1, 1]). size mismatch for up_blocks.1.resnets.0.conv_shortcut.bias: copying a param with shape torch.Size([640]) from checkpoint, the shape in current model is torch.Size([1280]). size mismatch for up_blocks.1.resnets.1.norm1.weight: copying a param with shape torch.Size([1280]) from checkpoint, the shape in current model is torch.Size([2560]). size mismatch for up_blocks.1.resnets.1.norm1.bias: copying a param with shape torch.Size([1280]) from checkpoint, the shape in current model is torch.Size([2560]). size mismatch for up_blocks.1.resnets.1.conv1.weight: copying a param with shape torch.Size([640, 1280, 3, 3]) from checkpoint, the shape in current model is torch.Size([1280, 2560, 3, 3]). size mismatch for up_blocks.1.resnets.1.conv1.bias: copying a param with shape torch.Size([640]) from checkpoint, the shape in current model is torch.Size([1280]). size mismatch for up_blocks.1.resnets.1.time_emb_proj.weight: copying a param with shape torch.Size([640, 1280]) from checkpoint, the shape in current model is torch.Size([1280, 1280]). size mismatch for up_blocks.1.resnets.1.time_emb_proj.bias: copying a param with shape torch.Size([640]) from checkpoint, the shape in current model is torch.Size([1280]). size mismatch for up_blocks.1.resnets.1.norm2.weight: copying a param with shape torch.Size([640]) from checkpoint, the shape in current model is torch.Size([1280]). size mismatch for up_blocks.1.resnets.1.norm2.bias: copying a param with shape torch.Size([640]) from checkpoint, the shape in current model is torch.Size([1280]). size mismatch for up_blocks.1.resnets.1.conv2.weight: copying a param with shape torch.Size([640, 640, 3, 3]) from checkpoint, the shape in current model is torch.Size([1280, 1280, 3, 3]). size mismatch for up_blocks.1.resnets.1.conv2.bias: copying a param with shape torch.Size([640]) from checkpoint, the shape in current model is torch.Size([1280]). size mismatch for up_blocks.1.resnets.1.conv_shortcut.weight: copying a param with shape torch.Size([640, 1280, 1, 1]) from checkpoint, the shape in current model is torch.Size([1280, 2560, 1, 1]). size mismatch for up_blocks.1.resnets.1.conv_shortcut.bias: copying a param with shape torch.Size([640]) from checkpoint, the shape in current model is torch.Size([1280]). size mismatch for up_blocks.1.resnets.2.norm1.weight: copying a param with shape torch.Size([960]) from checkpoint, the shape in current model is torch.Size([1920]). size mismatch for up_blocks.1.resnets.2.norm1.bias: copying a param with shape torch.Size([960]) from checkpoint, the shape in current model is torch.Size([1920]). size mismatch for up_blocks.1.resnets.2.conv1.weight: copying a param with shape torch.Size([640, 960, 3, 3]) from checkpoint, the shape in current model is torch.Size([1280, 1920, 3, 3]). size mismatch for up_blocks.1.resnets.2.conv1.bias: copying a param with shape torch.Size([640]) from checkpoint, the shape in current model is torch.Size([1280]). size mismatch for up_blocks.1.resnets.2.time_emb_proj.weight: copying a param with shape torch.Size([640, 1280]) from checkpoint, the shape in current model is torch.Size([1280, 1280]). size mismatch for up_blocks.1.resnets.2.time_emb_proj.bias: copying a param with shape torch.Size([640]) from checkpoint, the shape in current model is torch.Size([1280]). size mismatch for up_blocks.1.resnets.2.norm2.weight: copying a param with shape torch.Size([640]) from checkpoint, the shape in current model is torch.Size([1280]). size mismatch for up_blocks.1.resnets.2.norm2.bias: copying a param with shape torch.Size([640]) from checkpoint, the shape in current model is torch.Size([1280]). size mismatch for up_blocks.1.resnets.2.conv2.weight: copying a param with shape torch.Size([640, 640, 3, 3]) from checkpoint, the shape in current model is torch.Size([1280, 1280, 3, 3]). size mismatch for up_blocks.1.resnets.2.conv2.bias: copying a param with shape torch.Size([640]) from checkpoint, the shape in current model is torch.Size([1280]). size mismatch for up_blocks.1.resnets.2.conv_shortcut.weight: copying a param with shape torch.Size([640, 960, 1, 1]) from checkpoint, the shape in current model is torch.Size([1280, 1920, 1, 1]). size mismatch for up_blocks.1.resnets.2.conv_shortcut.bias: copying a param with shape torch.Size([640]) from checkpoint, the shape in current model is torch.Size([1280]). size mismatch for up_blocks.1.upsamplers.0.conv.weight: copying a param with shape torch.Size([640, 640, 3, 3]) from checkpoint, the shape in current model is torch.Size([1280, 1280, 3, 3]). size mismatch for up_blocks.1.upsamplers.0.conv.bias: copying a param with shape torch.Size([640]) from checkpoint, the shape in current model is torch.Size([1280]). size mismatch for up_blocks.2.resnets.0.norm1.weight: copying a param with shape torch.Size([960]) from checkpoint, the shape in current model is torch.Size([1920]). size mismatch for up_blocks.2.resnets.0.norm1.bias: copying a param with shape torch.Size([960]) from checkpoint, the shape in current model is torch.Size([1920]). size mismatch for up_blocks.2.resnets.0.conv1.weight: copying a param with shape torch.Size([320, 960, 3, 3]) from checkpoint, the shape in current model is torch.Size([640, 1920, 3, 3]). size mismatch for up_blocks.2.resnets.0.conv1.bias: copying a param with shape torch.Size([320]) from checkpoint, the shape in current model is torch.Size([640]). size mismatch for up_blocks.2.resnets.0.time_emb_proj.weight: copying a param with shape torch.Size([320, 1280]) from checkpoint, the shape in current model is torch.Size([640, 1280]). size mismatch for up_blocks.2.resnets.0.time_emb_proj.bias: copying a param with shape torch.Size([320]) from checkpoint, the shape in current model is torch.Size([640]). size mismatch for up_blocks.2.resnets.0.norm2.weight: copying a param with shape torch.Size([320]) from checkpoint, the shape in current model is torch.Size([640]). size mismatch for up_blocks.2.resnets.0.norm2.bias: copying a param with shape torch.Size([320]) from checkpoint, the shape in current model is torch.Size([640]). size mismatch for up_blocks.2.resnets.0.conv2.weight: copying a param with shape torch.Size([320, 320, 3, 3]) from checkpoint, the shape in current model is torch.Size([640, 640, 3, 3]). size mismatch for up_blocks.2.resnets.0.conv2.bias: copying a param with shape torch.Size([320]) from checkpoint, the shape in current model is torch.Size([640]). size mismatch for up_blocks.2.resnets.0.conv_shortcut.weight: copying a param with shape torch.Size([320, 960, 1, 1]) from checkpoint, the shape in current model is torch.Size([640, 1920, 1, 1]). size mismatch for up_blocks.2.resnets.0.conv_shortcut.bias: copying a param with shape torch.Size([320]) from checkpoint, the shape in current model is torch.Size([640]). size mismatch for up_blocks.2.resnets.1.norm1.weight: copying a param with shape torch.Size([640]) from checkpoint, the shape in current model is torch.Size([1280]). size mismatch for up_blocks.2.resnets.1.norm1.bias: copying a param with shape torch.Size([640]) from checkpoint, the shape in current model is torch.Size([1280]). size mismatch for up_blocks.2.resnets.1.conv1.weight: copying a param with shape torch.Size([320, 640, 3, 3]) from checkpoint, the shape in current model is torch.Size([640, 1280, 3, 3]). size mismatch for up_blocks.2.resnets.1.conv1.bias: copying a param with shape torch.Size([320]) from checkpoint, the shape in current model is torch.Size([640]). size mismatch for up_blocks.2.resnets.1.time_emb_proj.weight: copying a param with shape torch.Size([320, 1280]) from checkpoint, the shape in current model is torch.Size([640, 1280]). size mismatch for up_blocks.2.resnets.1.time_emb_proj.bias: copying a param with shape torch.Size([320]) from checkpoint, the shape in current model is torch.Size([640]). size mismatch for up_blocks.2.resnets.1.norm2.weight: copying a param with shape torch.Size([320]) from checkpoint, the shape in current model is torch.Size([640]). size mismatch for up_blocks.2.resnets.1.norm2.bias: copying a param with shape torch.Size([320]) from checkpoint, the shape in current model is torch.Size([640]). size mismatch for up_blocks.2.resnets.1.conv2.weight: copying a param with shape torch.Size([320, 320, 3, 3]) from checkpoint, the shape in current model is torch.Size([640, 640, 3, 3]). size mismatch for up_blocks.2.resnets.1.conv2.bias: copying a param with shape torch.Size([320]) from checkpoint, the shape in current model is torch.Size([640]). size mismatch for up_blocks.2.resnets.1.conv_shortcut.weight: copying a param with shape torch.Size([320, 640, 1, 1]) from checkpoint, the shape in current model is torch.Size([640, 1280, 1, 1]). size mismatch for up_blocks.2.resnets.1.conv_shortcut.bias: copying a param with shape torch.Size([320]) from checkpoint, the shape in current model is torch.Size([640]). size mismatch for up_blocks.2.resnets.2.norm1.weight: copying a param with shape torch.Size([640]) from checkpoint, the shape in current model is torch.Size([960]). size mismatch for up_blocks.2.resnets.2.norm1.bias: copying a param with shape torch.Size([640]) from checkpoint, the shape in current model is torch.Size([960]). size mismatch for up_blocks.2.resnets.2.conv1.weight: copying a param with shape torch.Size([320, 640, 3, 3]) from checkpoint, the shape in current model is torch.Size([640, 960, 3, 3]). size mismatch for up_blocks.2.resnets.2.conv1.bias: copying a param with shape torch.Size([320]) from checkpoint, the shape in current model is torch.Size([640]). size mismatch for up_blocks.2.resnets.2.time_emb_proj.weight: copying a param with shape torch.Size([320, 1280]) from checkpoint, the shape in current model is torch.Size([640, 1280]). size mismatch for up_blocks.2.resnets.2.time_emb_proj.bias: copying a param with shape torch.Size([320]) from checkpoint, the shape in current model is torch.Size([640]). size mismatch for up_blocks.2.resnets.2.norm2.weight: copying a param with shape torch.Size([320]) from checkpoint, the shape in current model is torch.Size([640]). size mismatch for up_blocks.2.resnets.2.norm2.bias: copying a param with shape torch.Size([320]) from checkpoint, the shape in current model is torch.Size([640]). size mismatch for up_blocks.2.resnets.2.conv2.weight: copying a param with shape torch.Size([320, 320, 3, 3]) from checkpoint, the shape in current model is torch.Size([640, 640, 3, 3]). size mismatch for up_blocks.2.resnets.2.conv2.bias: copying a param with shape torch.Size([320]) from checkpoint, the shape in current model is torch.Size([640]). size mismatch for up_blocks.2.resnets.2.conv_shortcut.weight: copying a param with shape torch.Size([320, 640, 1, 1]) from checkpoint, the shape in current model is torch.Size([640, 960, 1, 1]). size mismatch for up_blocks.2.resnets.2.conv_shortcut.bias: copying a param with shape torch.Size([320]) from checkpoint, the shape in current model is torch.Size([640]). size mismatch for mid_block.attentions.0.proj_in.weight: copying a param with shape torch.Size([1280, 1280]) from checkpoint, the shape in current model is torch.Size([1280, 1280, 1, 1]). size mismatch for mid_block.attentions.0.transformer_blocks.0.attn2.to_k.weight: copying a param with shape torch.Size([1280, 2048]) from checkpoint, the shape in current model is torch.Size([1280, 768]). size mismatch for mid_block.attentions.0.transformer_blocks.0.attn2.to_v.weight: copying a param with shape torch.Size([1280, 2048]) from checkpoint, the shape in current model is torch.Size([1280, 768]). size mismatch for mid_block.attentions.0.proj_out.weight: copying a param with shape torch.Size([1280, 1280]) from checkpoint, the shape in current model is torch.Size([1280, 1280, 1, 1]). Traceback (most recent call last): File "C:\Users\josep\AppData\Local\Programs\Python\Python310\lib\runpy.py", line 196, in _run_module_as_main return _run_code(code, main_globals, None, File "C:\Users\josep\AppData\Local\Programs\Python\Python310\lib\runpy.py", line 86, in _run_code exec(code, run_globals) File "E:\GitHub\kohya_ss\venv\Scripts\accelerate.exe__main__.py", line 7, in File "E:\GitHub\kohya_ss\venv\lib\site-packages\accelerate\commands\accelerate_cli.py", line 47, in main args.func(args) File "E:\GitHub\kohya_ss\venv\lib\site-packages\accelerate\commands\launch.py", line 986, in launch_command simple_launcher(args) File "E:\GitHub\kohya_ss\venv\lib\site-packages\accelerate\commands\launch.py", line 628, in simple_launcher raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd) subprocess.CalledProcessError: Command '['E:\GitHub\kohya_ss\venv\Scripts\python.exe', './train_network.py', '--enable_bucket', '--min_bucket_reso=256', '--max_bucket_reso=2048', '--pretrained_model_name_or_path=E:/GitHub/stable-diffusion-webui/models/Stable-diffusion/sd_xl_base_1.0.safetensors', '--train_data_dir=E:\GitHub\training\Merial', '--resolution=512,512', '--output_dir=C:\Users\josep\Desktop\Lora', '--logging_dir=C:\Users\josep\Desktop\Lora', '--network_alpha=128', '--save_model_as=safetensors', '--network_module=networks.lora', '--text_encoder_lr=5e-05', '--unet_lr=0.00015', '--network_dim=128', '--output_name=Merial13', '--lr_scheduler_num_cycles=15', '--no_half_vae', '--learning_rate=0.00015', '--lr_scheduler=cosine', '--train_batch_size=2', '--max_train_steps=750', '--save_every_n_epochs=5', '--mixed_precision=bf16', '--save_precision=bf16', '--caption_extension=.txt', '--cache_latents', '--optimizer_type=AdamW8bit', '--max_grad_norm=1', '--max_data_loader_n_workers=0', '--max_token_length=225', '--clip_skip=2', '--bucket_reso_steps=64', '--xformers', '--bucket_no_upscale', '--noise_offset=0.0']' returned non-zero exit status 1.

An error when SDXL + AdamW8bit

Joedog6503 commented 10 months ago

From hear out, ill just use the last code of error and how i got the error

Joedog6503 commented 10 months ago

full exp 16

05:18:44-610784 INFO accelerate launch --num_cpu_threads_per_process=2 "./train_network.py" --enable_bucket --min_bucket_reso=256 --max_bucket_reso=2048 --pretrained_model_name_or_path="runwayml/stable-diffusion-v1-5" --train_data_dir="E:\GitHub\training\Merial" --resolution="512,512" --output_dir="C:\Users\josep\Desktop\Lora" --logging_dir="C:\Users\josep\Desktop\Lora" --network_alpha="128" --save_model_as=safetensors --network_module=networks.lora --text_encoder_lr=5e-05 --unet_lr=0.00015 --network_dim=128 --output_name="Merial13" --lr_scheduler_num_cycles="15" --no_half_vae --learning_rate="0.00015" --lr_scheduler="cosine" --train_batch_size="2" --max_train_steps="750" --save_every_n_epochs="5" --mixed_precision="bf16" --save_precision="bf16" --caption_extension=".txt" --cache_latents --optimizer_type="AdamW" --max_grad_norm="1" --max_data_loader_n_workers="0" --max_token_length=225 --clip_skip=2 --bucket_reso_steps=64 --xformers --bucket_no_upscale --noise_offset=0.0 prepare tokenizer update token length: 225 Using DreamBooth method. prepare images. found directory E:\GitHub\training\Merial\10_Merial contains 10 image files 100 train images with repeating. 0 reg images. no regularization images / 正則化画像が見つかりませんでした [Dataset 0] batch_size: 2 resolution: (512, 512) enable_bucket: True min_bucket_reso: 256 max_bucket_reso: 2048 bucket_reso_steps: 64 bucket_no_upscale: True

[Subset 0 of Dataset 0] image_dir: "E:\GitHub\training\Merial\10_Merial" image_count: 10 num_repeats: 10 shuffle_caption: False keep_tokens: 0 keep_tokens_separator: caption_dropout_rate: 0.0 caption_dropout_every_n_epoches: 0 caption_tag_dropout_rate: 0.0 caption_prefix: None caption_suffix: None color_aug: False flip_aug: False face_crop_aug_range: None random_crop: False token_warmup_min: 1, token_warmup_step: 0, is_reg: False class_tokens: Merial caption_extension: .txt

[Dataset 0] loading image sizes. 100%|███████████████████████████████████████████████████████████████████████████| 10/10 [00:00<00:00, 152.65it/s] make buckets min_bucket_reso and max_bucket_reso are ignored if bucket_no_upscale is set, because bucket reso is defined by image size automatically / bucket_no_upscaleが指定された場合は、bucketの解像度は画像サイズから自動計算されるため、min_bucket_resoとmax_bucket_resoは無視されます number of images (including repeats) / 各bucketの画像枚数（繰り返し回数を含む） bucket 0: resolution (128, 256), count: 20 bucket 1: resolution (192, 384), count: 10 bucket 2: resolution (256, 384), count: 10 bucket 3: resolution (256, 576), count: 20 bucket 4: resolution (320, 512), count: 10 bucket 5: resolution (384, 512), count: 10 bucket 6: resolution (384, 576), count: 10 bucket 7: resolution (512, 512), count: 10 mean ar error (without repeats): 0.06391376824093384 preparing accelerator loading model for process 0/1 load Diffusers pretrained models: runwayml/stable-diffusion-v1-5 Loading pipeline components...: 100%|██████████████████████████████████████████████| 5/5 [00:00<00:00, 10.15it/s] You have disabled the safety checker for <class 'diffusers.pipelines.stable_diffusion.pipeline_stable_diffusion.StableDiffusionPipeline'> by passing safety_checker=None. Ensure that you abide to the conditions of the Stable Diffusion license and do not expose unfiltered results in services or applications open to the public. Both the diffusers team and Hugging Face strongly recommend to keep the safety filter enabled in all public facing circumstances, disabling it only for use-cases that involve analyzing network behavior or auditing its results. For more information, please have a look at https://github.com/huggingface/diffusers/pull/254 . UNet2DConditionModel: 64, 8, 768, False, False U-Net converted to original U-Net Enable xformers for U-Net A matching Triton is not available, some optimizations will not be enabled. Error caught was: No module named 'triton' import network module: networks.lora [Dataset 0] caching latents. checking cache validity... 100%|████████████████████████████████████████████████████████████████████████████████████| 10/10 [00:00<?, ?it/s] caching latents... 100%|████████████████████████████████████████████████████████████████████████████| 10/10 [00:00<00:00, 13.83it/s] create LoRA network. base dim (rank): 128, alpha: 128.0 neuron dropout: p=None, rank dropout: p=None, module dropout: p=None create LoRA for Text Encoder: create LoRA for Text Encoder: 72 modules. create LoRA for U-Net: 192 modules. enable LoRA for text encoder enable LoRA for U-Net prepare optimizer, data loader etc. use AdamW optimizer | {} running training / 学習開始 num train images repeats / 学習画像の数×繰り返し回数: 100 num reg images / 正則化画像の数: 0 num batches per epoch / 1epochのバッチ数: 50 num epochs / epoch数: 15 batch size per device / バッチサイズ: 2 gradient accumulation steps / 勾配を合計するステップ数 = 1 total optimization steps / 学習ステップ数: 750 steps: 0%| | 0/750 [00:00<?, ?it/s] epoch 1/15 steps: 1%|▋ | 10/750 [00:03<03:53, 3.17it/s, avr_loss=0.18]05:19:07-836867 INFO The running process has been terminated. 05:19:26-698684 INFO Start training LoRA Standard ... 05:19:26-699684 INFO Checking for duplicate image filenames in training data directory... Warning: Same filename 'Merial (1)' with different image extension found. This will cause training issues. Rename one of the file. Existing file: E:\GitHub\training\Merial\10_Merial\Merial (1).jpg Current file: E:\GitHub\training\Merial\10_Merial\Merial (1).webp 05:19:26-703684 INFO Valid image folder names found in: E:\GitHub\training\Merial 05:19:26-705184 INFO Folder 10_Merial: 10 images found 05:19:26-706184 INFO Folder 10_Merial: 100 steps 05:19:26-707184 INFO Total steps: 100 05:19:26-707684 INFO Train batch size: 2 05:19:26-708685 INFO Gradient accumulation steps: 1 05:19:26-709684 INFO Epoch: 15 05:19:26-710684 INFO Regulatization factor: 1 05:19:26-711686 INFO max_train_steps (100 / 2 / 1 15 * 1) = 750 05:19:26-716186 INFO stop_text_encoder_training = 0 05:19:26-717186 INFO lr_warmup_steps = 0 05:19:26-718186 INFO Saving training config to C:\Users\josep\Desktop\Lora\Merial13_20240108-051926.json... 05:19:26-719688 INFO accelerate launch --num_cpu_threads_per_process=2 "./train_network.py" --enable_bucket --min_bucket_reso=256 --max_bucket_reso=2048 --pretrained_model_name_or_path="runwayml/stable-diffusion-v1-5" --train_data_dir="E:\GitHub\training\Merial" --resolution="512,512" --output_dir="C:\Users\josep\Desktop\Lora" --logging_dir="C:\Users\josep\Desktop\Lora" --network_alpha="128" --save_model_as=safetensors --network_module=networks.lora --text_encoder_lr=5e-05 --unet_lr=0.00015 --network_dim=128 --output_name="Merial13" --lr_scheduler_num_cycles="15" --no_half_vae --learning_rate="0.00015" --lr_scheduler="cosine" --train_batch_size="2" --max_train_steps="750" --save_every_n_epochs="5" --mixed_precision="fp16" --save_precision="fp16" --caption_extension=".txt" --cache_latents --optimizer_type="AdamW" --max_grad_norm="1" --max_data_loader_n_workers="0" --max_token_length=225 --clip_skip=2 --bucket_reso_steps=64 --xformers --bucket_no_upscale --noise_offset=0.0 prepare tokenizer update token length: 225 Using DreamBooth method. prepare images. found directory E:\GitHub\training\Merial\10_Merial contains 10 image files 100 train images with repeating. 0 reg images. no regularization images / 正則化画像が見つかりませんでした [Dataset 0] batch_size: 2 resolution: (512, 512) enable_bucket: True min_bucket_reso: 256 max_bucket_reso: 2048 bucket_reso_steps: 64 bucket_no_upscale: True

[Subset 0 of Dataset 0] image_dir: "E:\GitHub\training\Merial\10_Merial" image_count: 10 num_repeats: 10 shuffle_caption: False keep_tokens: 0 keep_tokens_separator: caption_dropout_rate: 0.0 caption_dropout_every_n_epoches: 0 caption_tag_dropout_rate: 0.0 caption_prefix: None caption_suffix: None color_aug: False flip_aug: False face_crop_aug_range: None random_crop: False token_warmup_min: 1, token_warmup_step: 0, is_reg: False class_tokens: Merial caption_extension: .txt

[Dataset 0] loading image sizes. 100%|███████████████████████████████████████████████████████████████████████████| 10/10 [00:00<00:00, 150.35it/s] make buckets min_bucket_reso and max_bucket_reso are ignored if bucket_no_upscale is set, because bucket reso is defined by image size automatically / bucket_no_upscaleが指定された場合は、bucketの解像度は画像サイズから自動計算されるため、min_bucket_resoとmax_bucket_resoは無視されます number of images (including repeats) / 各bucketの画像枚数（繰り返し回数を含む） bucket 0: resolution (128, 256), count: 20 bucket 1: resolution (192, 384), count: 10 bucket 2: resolution (256, 384), count: 10 bucket 3: resolution (256, 576), count: 20 bucket 4: resolution (320, 512), count: 10 bucket 5: resolution (384, 512), count: 10 bucket 6: resolution (384, 576), count: 10 bucket 7: resolution (512, 512), count: 10 mean ar error (without repeats): 0.06391376824093384 preparing accelerator loading model for process 0/1 load Diffusers pretrained models: runwayml/stable-diffusion-v1-5 Loading pipeline components...: 100%|██████████████████████████████████████████████| 5/5 [00:00<00:00, 8.72it/s] You have disabled the safety checker for <class 'diffusers.pipelines.stable_diffusion.pipeline_stable_diffusion.StableDiffusionPipeline'> by passing safety_checker=None. Ensure that you abide to the conditions of the Stable Diffusion license and do not expose unfiltered results in services or applications open to the public. Both the diffusers team and Hugging Face strongly recommend to keep the safety filter enabled in all public facing circumstances, disabling it only for use-cases that involve analyzing network behavior or auditing its results. For more information, please have a look at https://github.com/huggingface/diffusers/pull/254 . UNet2DConditionModel: 64, 8, 768, False, False U-Net converted to original U-Net Enable xformers for U-Net A matching Triton is not available, some optimizations will not be enabled. Error caught was: No module named 'triton' import network module: networks.lora [Dataset 0] caching latents. checking cache validity... 100%|████████████████████████████████████████████████████████████████████████████████████| 10/10 [00:00<?, ?it/s] caching latents... 100%|████████████████████████████████████████████████████████████████████████████| 10/10 [00:00<00:00, 13.77it/s] create LoRA network. base dim (rank): 128, alpha: 128.0 neuron dropout: p=None, rank dropout: p=None, module dropout: p=None create LoRA for Text Encoder: create LoRA for Text Encoder: 72 modules. create LoRA for U-Net: 192 modules. enable LoRA for text encoder enable LoRA for U-Net prepare optimizer, data loader etc. use AdamW optimizer | {} running training / 学習開始 num train images repeats / 学習画像の数×繰り返し回数: 100 num reg images / 正則化画像の数: 0 num batches per epoch / 1epochのバッチ数: 50 num epochs / epoch数: 15 batch size per device / バッチサイズ: 2 gradient accumulation steps / 勾配を合計するステップ数 = 1 total optimization steps / 学習ステップ数: 750 steps: 0%| | 0/750 [00:00<?, ?it/s] epoch 1/15 steps: 1%|▋ | 9/750 [00:03<04:40, 2.64it/s, avr_loss=0.139]05:19:48-828073 INFO The running process has been terminated. 05:20:59-691531 INFO Start training LoRA Standard ... 05:20:59-692531 INFO Checking for duplicate image filenames in training data directory... Warning: Same filename 'Merial (1)' with different image extension found. This will cause training issues. Rename one of the file. Existing file: E:\GitHub\training\Merial\10_Merial\Merial (1).jpg Current file: E:\GitHub\training\Merial\10_Merial\Merial (1).webp 05:20:59-695533 INFO Valid image folder names found in: E:\GitHub\training\Merial 05:20:59-697032 INFO Folder 10_Merial: 10 images found 05:20:59-698032 INFO Folder 10_Merial: 100 steps 05:20:59-699033 INFO Total steps: 100 05:20:59-700033 INFO Train batch size: 2 05:20:59-701033 INFO Gradient accumulation steps: 1 05:20:59-702034 INFO Epoch: 15 05:20:59-703033 INFO Regulatization factor: 1 05:20:59-704534 INFO max_train_steps (100 / 2 / 1 15 1) = 750 05:20:59-706034 INFO stop_text_encoder_training = 0 05:20:59-707035 INFO lr_warmup_steps = 0 05:20:59-708034 INFO Saving training config to C:\Users\josep\Desktop\Lora\Merial13_20240108-052059.json... 05:20:59-709534 INFO accelerate launch --num_cpu_threads_per_process=2 "./train_network.py" --enable_bucket --min_bucket_reso=256 --max_bucket_reso=2048 --pretrained_model_name_or_path="runwayml/stable-diffusion-v1-5" --train_data_dir="E:\GitHub\training\Merial" --resolution="512,512" --output_dir="C:\Users\josep\Desktop\Lora" --logging_dir="C:\Users\josep\Desktop\Lora" --network_alpha="128" --save_model_as=safetensors --network_module=networks.lora --text_encoder_lr=5e-05 --unet_lr=0.00015 --network_dim=128 --output_name="Merial13" --lr_scheduler_num_cycles="15" --no_half_vae --learning_rate="0.00015" --lr_scheduler="cosine" --train_batch_size="2" --max_train_steps="750" --save_every_n_epochs="5" --mixed_precision="fp16" --save_precision="fp16" --caption_extension=".txt" --cache_latents --optimizer_type="AdamW" --max_grad_norm="1" --max_data_loader_n_workers="0" --max_token_length=225 --clip_skip=2 --bucket_reso_steps=64 --full_fp16 --xformers --bucket_no_upscale --noise_offset=0.0 05:21:03-144139 INFO The running process has been terminated. 05:21:11-193553 INFO Applying preset SDXL - LoRA AI_characters standard v1.1... 05:21:11-195054 INFO Loading config... 05:21:20-433178 INFO Start training LoRA Standard ... 05:21:20-434678 INFO Checking for duplicate image filenames in training data directory... Warning: Same filename 'Merial (1)' with different image extension found. This will cause training issues. Rename one of the file. Existing file: E:\GitHub\training\Merial\10_Merial\Merial (1).jpg Current file: E:\GitHub\training\Merial\10_Merial\Merial (1).webp 05:21:20-438179 INFO Valid image folder names found in: E:\GitHub\training\Merial 05:21:20-439679 INFO Folder 10_Merial: 10 images found 05:21:20-440679 INFO Folder 10_Merial: 100 steps 05:21:20-441679 INFO Total steps: 100 05:21:20-442680 INFO Train batch size: 3 05:21:20-443680 INFO Gradient accumulation steps: 1 05:21:20-444680 INFO Epoch: 50 05:21:20-445181 INFO Regulatization factor: 1 05:21:20-446680 INFO max_train_steps (100 / 3 / 1 50 * 1) = 1667 05:21:20-448180 INFO stop_text_encoder_training = 0 05:21:20-449682 INFO lr_warmup_steps = 0 05:21:20-450682 INFO Saving training config to C:\Users\josep\Desktop\Lora\Merial13_20240108-052120.json... 05:21:20-452682 INFO accelerate launch --num_cpu_threads_per_process=2 "./sdxl_train_network.py" --enable_bucket --min_bucket_reso=256 --max_bucket_reso=2048 --pretrained_model_name_or_path="runwayml/stable-diffusion-v1-5" --train_data_dir="E:\GitHub\training\Merial" --resolution="1024,1024" --output_dir="C:\Users\josep\Desktop\Lora" --logging_dir="C:\Users\josep\Desktop\Lora" --network_alpha="32" --training_comment="3 repeats. More info: https://civitai.com/articles/1771" --save_model_as=safetensors --network_module=networks.lora --text_encoder_lr=3e-05 --unet_lr=3e-05 --network_dim=32 --output_name="Merial13" --lr_scheduler_num_cycles="50" --no_half_vae --learning_rate="3e-05" --lr_scheduler="constant" --train_batch_size="3" --max_train_steps="1667" --save_every_n_epochs="1" --mixed_precision="fp16" --save_precision="fp16" --caption_extension=".txt" --cache_latents --cache_latents_to_disk --optimizer_type="AdamW" --max_grad_norm="1" --max_train_epochs=50 --max_data_loader_n_workers="0" --caption_dropout_rate="0.05" --bucket_reso_steps=64 --min_snr_gamma=5 --gradient_checkpointing --full_fp16 --xformers --noise_offset=0.0 prepare tokenizers Using DreamBooth method. prepare images. found directory E:\GitHub\training\Merial\10_Merial contains 10 image files 100 train images with repeating. 0 reg images. no regularization images / 正則化画像が見つかりませんでした [Dataset 0] batch_size: 3 resolution: (1024, 1024) enable_bucket: True min_bucket_reso: 256 max_bucket_reso: 2048 bucket_reso_steps: 64 bucket_no_upscale: False

[Subset 0 of Dataset 0] image_dir: "E:\GitHub\training\Merial\10_Merial" image_count: 10 num_repeats: 10 shuffle_caption: False keep_tokens: 0 keep_tokens_separator: caption_dropout_rate: 0.05 caption_dropout_every_n_epoches: 0 caption_tag_dropout_rate: 0.0 caption_prefix: None caption_suffix: None color_aug: False flip_aug: False face_crop_aug_range: None random_crop: False token_warmup_min: 1, token_warmup_step: 0, is_reg: False class_tokens: Merial caption_extension: .txt

[Dataset 0] loading image sizes. 100%|███████████████████████████████████████████████████████████████████████████| 10/10 [00:00<00:00, 145.96it/s] make buckets number of images (including repeats) / 各bucketの画像枚数（繰り返し回数を含む） bucket 0: resolution (640, 1600), count: 20 bucket 1: resolution (704, 1408), count: 10 bucket 2: resolution (768, 1280), count: 20 bucket 3: resolution (832, 1216), count: 30 bucket 4: resolution (896, 1152), count: 10 bucket 5: resolution (1024, 1024), count: 10 mean ar error (without repeats): 0.02071550377769773 preparing accelerator loading model for process 0/1 load Diffusers pretrained models: runwayml/stable-diffusion-v1-5, variant=fp16 Loading pipeline components...: 100%|██████████████████████████████████████████████| 5/5 [00:00<00:00, 7.13it/s] Traceback (most recent call last): File "E:\GitHub\kohya_ss\sdxl_train_network.py", line 189, in trainer.train(args) File "E:\GitHub\kohya_ss\train_network.py", line 234, in train model_version, text_encoder, vae, unet = self.load_target_model(args, weight_dtype, accelerator) File "E:\GitHub\kohya_ss\sdxl_train_network.py", line 47, in load_target_model ) = sdxl_train_util.load_target_model(args, accelerator, sdxl_model_util.MODEL_VERSION_SDXL_BASE_V1_0, weight_dtype) File "E:\GitHub\kohya_ss\library\sdxl_train_util.py", line 34, in load_target_model ) = _load_target_model( File "E:\GitHub\kohya_ss\library\sdxl_train_util.py", line 103, in _load_target_model if text_encoder2.dtype != torch.float32: AttributeError: 'NoneType' object has no attribute 'dtype' Traceback (most recent call last): File "C:\Users\josep\AppData\Local\Programs\Python\Python310\lib\runpy.py", line 196, in _run_module_as_main return _run_code(code, main_globals, None, File "C:\Users\josep\AppData\Local\Programs\Python\Python310\lib\runpy.py", line 86, in _run_code exec(code, run_globals) File "E:\GitHub\kohya_ss\venv\Scripts\accelerate.exe__main__.py", line 7, in File "E:\GitHub\kohya_ss\venv\lib\site-packages\accelerate\commands\accelerate_cli.py", line 47, in main args.func(args) File "E:\GitHub\kohya_ss\venv\lib\site-packages\accelerate\commands\launch.py", line 986, in launch_command simple_launcher(args) File "E:\GitHub\kohya_ss\venv\lib\site-packages\accelerate\commands\launch.py", line 628, in simple_launcher raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd) subprocess.CalledProcessError: Command '['E:\GitHub\kohya_ss\venv\Scripts\python.exe', './sdxl_train_network.py', '--enable_bucket', '--min_bucket_reso=256', '--max_bucket_reso=2048', '--pretrained_model_name_or_path=runwayml/stable-diffusion-v1-5', '--train_data_dir=E:\GitHub\training\Merial', '--resolution=1024,1024', '--output_dir=C:\Users\josep\Desktop\Lora', '--logging_dir=C:\Users\josep\Desktop\Lora', '--network_alpha=32', '--training_comment=3 repeats. More info: https://civitai.com/articles/1771', '--save_model_as=safetensors', '--network_module=networks.lora', '--text_encoder_lr=3e-05', '--unet_lr=3e-05', '--network_dim=32', '--output_name=Merial13', '--lr_scheduler_num_cycles=50', '--no_half_vae', '--learning_rate=3e-05', '--lr_scheduler=constant', '--train_batch_size=3', '--max_train_steps=1667', '--save_every_n_epochs=1', '--mixed_precision=fp16', '--save_precision=fp16', '--caption_extension=.txt', '--cache_latents', '--cache_latents_to_disk', '--optimizer_type=AdamW', '--max_grad_norm=1', '--max_train_epochs=50', '--max_data_loader_n_workers=0', '--caption_dropout_rate=0.05', '--bucket_reso_steps=64', '--min_snr_gamma=5', '--gradient_checkpointing', '--full_fp16', '--xformers', '--noise_offset=0.0']' returned non-zero exit status 1.

DKnight54 commented 10 months ago

05:13:40-812875 INFO Valid image folder names found in: E:\GitHub\training\Merial 05:13:40-813874 INFO Folder 10_Merial: 10 images found 05:13:40-816875 INFO Folder 10_Merial: 100 steps 05:13:40-817875 INFO Total steps: 100 05:13:40-818875 INFO Train batch size: 2 05:13:40-819876 INFO Gradient accumulation steps: 1 05:13:40-820877 INFO Epoch: 15 05:13:40-822376 INFO Regulatization factor: 1 05:13:40-823376 INFO max_train_steps (100 / 2 / 1 15 1) = 750 05:13:40-824878 INFO stop_text_encoder_training = 0 05:13:40-826377 INFO lr_warmup_steps = 0 05:13:40-827377 INFO Saving training config to C:\Users\josep\Desktop\Lora\Merial13_20240108-051340.json... 05:13:40-828878 INFO accelerate launch --num_cpu_threads_per_process=2 "./train_network.py" --enable_bucket --min_bucket_reso=256 --max_bucket_reso=2048 --pretrained_model_name_or_path="runwayml/stable-diffusion-v1-5" --train_data_dir="E:\GitHub\training\Merial" --resolution="512,512" --output_dir="C:\Users\josep\Desktop\Lora" --logging_dir="C:\Users\josep\Desktop\Lora" --network_alpha="128" --save_model_as=safetensors --network_module=networks.lora --text_encoder_lr=5e-05 --unet_lr=0.00015 --network_dim=128 --output_name="Merial13" --lr_scheduler_num_cycles="15" --no_half_vae --learning_rate="0.00015" --lr_scheduler="cosine" --train_batch_size="2" --max_train_steps="750" --save_every_n_epochs="5" --mixed_precision="bf16" --save_precision="bf16" --caption_extension=".txt" --cache_latents --optimizer_type="AdamW8bit" --max_grad_norm="1" --max_data_loader_n_workers="0" --max_token_length=225 --clip_skip=2 --bucket_reso_steps=64 --xformers --bucket_no_upscale --noise_offset=0.0 prepare tokenizer update token length: 225 Using DreamBooth method. prepare images. found directory E:\GitHub\training\Merial\10_Merial contains 10 image files 100 train images with repeating. 0 reg images. no regularization images / 正則化画像が見つかりませんでした [Dataset 0] batch_size: 2 resolution: (512, 512) enable_bucket: True min_bucket_reso: 256 max_bucket_reso: 2048 bucket_reso_steps: 64 bucket_no_upscale: True

[Subset 0 of Dataset 0] image_dir: "E:\GitHub\training\Merial\10_Merial" image_count: 10 num_repeats: 10 shuffle_caption: False keep_tokens: 0 keep_tokens_separator: caption_dropout_rate: 0.0 caption_dropout_every_n_epoches: 0 caption_tag_dropout_rate: 0.0 caption_prefix: None caption_suffix: None color_aug: False flip_aug: False face_crop_aug_range: None random_crop: False token_warmup_min: 1, token_warmup_step: 0, is_reg: False class_tokens: Merial caption_extension: .txt

[Dataset 0] loading image sizes. 100%|███████████████████████████████████████████████████████████████████████████| 10/10 [00:00<00:00, 139.83it/s] make buckets min_bucket_reso and max_bucket_reso are ignored if bucket_no_upscale is set, because bucket reso is defined by image size automatically / bucket_no_upscaleが指定された場合は、bucketの解像度は画像サイズから自動計算されるため、min_bucket_resoとmax_bucket_resoは無視されます number of images (including repeats) / 各bucketの画像枚数（繰り返し回数を含む） bucket 0: resolution (128, 256), count: 20 bucket 1: resolution (192, 384), count: 10 bucket 2: resolution (256, 384), count: 10 bucket 3: resolution (256, 576), count: 20 bucket 4: resolution (320, 512), count: 10 bucket 5: resolution (384, 512), count: 10 bucket 6: resolution (384, 576), count: 10 bucket 7: resolution (512, 512), count: 10 mean ar error (without repeats): 0.06391376824093384 preparing accelerator loading model for process 0/1 load Diffusers pretrained models: runwayml/stable-diffusion-v1-5 Loading pipeline components...: 100%|██████████████████████████████████████████████| 5/5 [00:00<00:00, 7.01it/s] You have disabled the safety checker for <class 'diffusers.pipelines.stable_diffusion.pipeline_stable_diffusion.StableDiffusionPipeline'> by passing safety_checker=None. Ensure that you abide to the conditions of the Stable Diffusion license and do not expose unfiltered results in services or applications open to the public. Both the diffusers team and Hugging Face strongly recommend to keep the safety filter enabled in all public facing circumstances, disabling it only for use-cases that involve analyzing network behavior or auditing its results. For more information, please have a look at huggingface/diffusers#254 . UNet2DConditionModel: 64, 8, 768, False, False U-Net converted to original U-Net Enable xformers for U-Net A matching Triton is not available, some optimizations will not be enabled. Error caught was: No module named 'triton' import network module: networks.lora [Dataset 0] caching latents. checking cache validity... 100%|████████████████████████████████████████████████████████████████████████████████████| 10/10 [00:00<?, ?it/s] caching latents... 100%|████████████████████████████████████████████████████████████████████████████| 10/10 [00:00<00:00, 13.69it/s] create LoRA network. base dim (rank): 128, alpha: 128.0 neuron dropout: p=None, rank dropout: p=None, module dropout: p=None create LoRA for Text Encoder: create LoRA for Text Encoder: 72 modules. create LoRA for U-Net: 192 modules. enable LoRA for text encoder enable LoRA for U-Net prepare optimizer, data loader etc. Traceback (most recent call last): File "E:\GitHub\kohya_ss\library\train_util.py", line 3480, in get_optimizer import bitsandbytes as bnb File "E:\GitHub\kohya_ss\venv\lib\site-packages\bitsandbytesinit.py", line 6, in from . import cuda_setup, utils, research File "E:\GitHub\kohya_ss\venv\lib\site-packages\bitsandbytes\researchinit.py", line 1, in from . import nn File "E:\GitHub\kohya_ss\venv\lib\site-packages\bitsandbytes\research\nninit.py", line 1, in from .modules import LinearFP8Mixed, LinearFP8Global File "E:\GitHub\kohya_ss\venv\lib\site-packages\bitsandbytes\research\nn\modules.py", line 8, in from bitsandbytes.optim import GlobalOptimManager File "E:\GitHub\kohya_ss\venv\lib\site-packages\bitsandbytes\optiminit.py", line 6, in from bitsandbytes.cextension import COMPILED_WITH_CUDA File "E:\GitHub\kohya_ss\venv\lib\site-packages\bitsandbytes\cextension.py", line 5, in from .cuda_setup.main import evaluate_cuda_setup File "E:\GitHub\kohya_ss\venv\lib\site-packages\bitsandbytes\cuda_setup\main.py", line 21, in from .paths import determine_cuda_runtime_lib_path ModuleNotFoundError: No module named 'bitsandbytes.cuda_setup.paths'

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "E:\GitHub\kohya_ss\train_network.py", line 996, in trainer.train(args) File "E:\GitHub\kohya_ss\train_network.py", line 348, in train optimizer_name, optimizer_args, optimizer = train_util.get_optimizer(args, trainable_params) File "E:\GitHub\kohya_ss\library\train_util.py", line 3482, in get_optimizer raise ImportError("No bitsandbytes / bitsandbytesがインストールされていないようです") ImportError: No bitsandbytes / bitsandbytesがインストールされていないようです Traceback (most recent call last): File "C:\Users\josep\AppData\Local\Programs\Python\Python310\lib\runpy.py", line 196, in _run_module_as_main return _run_code(code, main_globals, None, File "C:\Users\josep\AppData\Local\Programs\Python\Python310\lib\runpy.py", line 86, in _run_code exec(code, run_globals) File "E:\GitHub\kohya_ss\venv\Scripts\accelerate.exemain.py", line 7, in File "E:\GitHub\kohya_ss\venv\lib\site-packages\accelerate\commands\accelerate_cli.py", line 47, in main args.func(args) File "E:\GitHub\kohya_ss\venv\lib\site-packages\accelerate\commands\launch.py", line 986, in launch_command simple_launcher(args) File "E:\GitHub\kohya_ss\venv\lib\site-packages\accelerate\commands\launch.py", line 628, in simple_launcher raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd) subprocess.CalledProcessError: Command '['E:\GitHub\kohya_ss\venv\Scripts\python.exe', './train_network.py', '--enable_bucket', '--min_bucket_reso=256', '--max_bucket_reso=2048', '--pretrained_model_name_or_path=runwayml/stable-diffusion-v1-5', '--train_data_dir=E:\GitHub\training\Merial', '--resolution=512,512', '--output_dir=C:\Users\josep\Desktop\Lora', '--logging_dir=C:\Users\josep\Desktop\Lora', '--network_alpha=128', '--save_model_as=safetensors', '--network_module=networks.lora', '--text_encoder_lr=5e-05', '--unet_lr=0.00015', '--network_dim=128', '--output_name=Merial13', '--lr_scheduler_num_cycles=15', '--no_half_vae', '--learning_rate=0.00015', '--lr_scheduler=cosine', '--train_batch_size=2', '--max_train_steps=750', '--save_every_n_epochs=5', '--mixed_precision=bf16', '--save_precision=bf16', '--caption_extension=.txt', '--cache_latents', '--optimizer_type=AdamW8bit', '--max_grad_norm=1', '--max_data_loader_n_workers=0', '--max_token_length=225', '--clip_skip=2', '--bucket_reso_steps=64', '--xformers', '--bucket_no_upscale', '--noise_offset=0.0']' returned non-zero exit status 1.

Another error when i use AdamW8bit

In order to use 8 bit optimizers like AdamW8bit, you'd need to install bitsandbytes.

Since it seems like you want to try using bf16 training also, please try installing using the steps from the readme: python -m pip install bitsandbytes==0.41.1 --prefer-binary --extra-index-url=https://jllllll.github.io/bitsandbytes-windows-webui

Please note, I'm not sure if that will install it properly as from what I can see, you are using a virtual environment (VENV)

For another low VRAM optimizer, you can try using Adafactor(SDXL TOML config file format arguements below):

optimizer_type = "adafactor"
optimizer_args = [ "scale_parameter=False", "relative_step=False", "warmup_init=False" ]
lr_scheduler = "constant_with_warmup"
lr_warmup_steps = 100
learning_rate = 4e-7 # SDXL original learning rate

I also suggest reading this guide for better understanding about optimizers (and tips and tricks on training LoRAs)

DKnight54 commented 10 months ago

Current file: E:\GitHub\training\Merial\10_Merial\Merial (1).webp 05:08:50-718874 INFO Valid image folder names found in: E:\GitHub\training\Merial 05:08:50-720375 INFO Folder 10_Merial: 10 images found 05:08:50-721376 INFO Folder 10_Merial: 100 steps 05:08:50-722375 INFO Total steps: 100 05:08:50-723376 INFO Train batch size: 3 05:08:50-723876 INFO Gradient accumulation steps: 1 05:08:50-725377 INFO Epoch: 50 05:08:50-726876 INFO Regulatization factor: 1 05:08:50-727876 INFO max_train_steps (100 / 3 / 1 50 1) = 1667 05:08:50-730376 INFO stop_text_encoder_training = 0 05:08:50-731377 INFO lr_warmup_steps = 0 05:08:50-732377 INFO Saving training config to C:\Users\josep\Desktop\Lora\Merial12_20240108-050850.json... 05:08:50-734377 INFO accelerate launch --num_cpu_threads_per_process=2 "./sdxl_train_network.py" --enable_bucket --min_bucket_reso=256 --max_bucket_reso=2048 --pretrained_model_name_or_path="E:/GitHub/stable-diffusion-webui/models/Stable-diffusion /sd_xl_base_1.0.safetensors" --train_data_dir="E:\GitHub\training\Merial" --resolution="1024,1024" --output_dir="C:\Users\josep\Desktop\Lora" --logging_dir="C:\Users\josep\Desktop\Lora" --network_alpha="32" --training_comment="3 repeats. More info: https://civitai.com/articles/1771" --save_model_as=safetensors --network_module=networks.lora --text_encoder_lr=3e-05 --unet_lr=3e-05 --network_dim=32 --output_name="Merial12" --lr_scheduler_num_cycles="50" --no_half_vae --learning_rate="3e-05" --lr_scheduler="constant" --train_batch_size="3" --max_train_steps="1667" --save_every_n_epochs="1" --mixed_precision="fp16" --save_precision="fp16" --caption_extension=".txt" --cache_latents --cache_latents_to_disk --optimizer_type="AdamW" --max_grad_norm="1" --max_train_epochs=50 --max_data_loader_n_workers="0" --caption_dropout_rate="0.05" --bucket_reso_steps=64 --min_snr_gamma=5 --gradient_checkpointing --xformers --noise_offset=0.0 prepare tokenizers Using DreamBooth method. prepare images. found directory E:\GitHub\training\Merial\10_Merial contains 10 image files 100 train images with repeating. 0 reg images. no regularization images / 正則化画像が見つかりませんでした [Dataset 0] batch_size: 3 resolution: (1024, 1024) enable_bucket: True min_bucket_reso: 256 max_bucket_reso: 2048 bucket_reso_steps: 64 bucket_no_upscale: False

[Subset 0 of Dataset 0] image_dir: "E:\GitHub\training\Merial\10_Merial" image_count: 10 num_repeats: 10 shuffle_caption: False keep_tokens: 0 keep_tokens_separator: caption_dropout_rate: 0.05 caption_dropout_every_n_epoches: 0 caption_tag_dropout_rate: 0.0 caption_prefix: None caption_suffix: None color_aug: False flip_aug: False face_crop_aug_range: None random_crop: False token_warmup_min: 1, token_warmup_step: 0, is_reg: False class_tokens: Merial caption_extension: .txt

[Dataset 0] loading image sizes. 100%|████████████████████████████████████████████████████████████████████████████| 10/10 [00:00<00:00, 88.09it/s] make buckets number of images (including repeats) / 各bucketの画像枚数（繰り返し回数を含む） bucket 0: resolution (640, 1600), count: 20 bucket 1: resolution (704, 1408), count: 10 bucket 2: resolution (768, 1280), count: 20 bucket 3: resolution (832, 1216), count: 30 bucket 4: resolution (896, 1152), count: 10 bucket 5: resolution (1024, 1024), count: 10 mean ar error (without repeats): 0.02071550377769773 preparing accelerator loading model for process 0/1 load StableDiffusion checkpoint: E:/GitHub/stable-diffusion-webui/models/Stable-diffusion/sd_xl_base_1.0.safetensors building U-Net loading U-Net from checkpoint U-Net: building text encoders loading text encoders from checkpoint text encoder 1: text encoder 2: building VAE loading VAE from checkpoint VAE: Enable xformers for U-Net A matching Triton is not available, some optimizations will not be enabled. Error caught was: No module named 'triton' import network module: networks.lora [Dataset 0] caching latents. checking cache validity... 100%|███████████████████████████████████████████████████████████████████████████| 10/10 [00:00<00:00, 170.91it/s] caching latents... 100%|██████████████████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00, 1.72it/s] create LoRA network. base dim (rank): 32, alpha: 32.0 neuron dropout: p=None, rank dropout: p=None, module dropout: p=None create LoRA for Text Encoder 1: create LoRA for Text Encoder 2: create LoRA for Text Encoder: 264 modules. create LoRA for U-Net: 722 modules. enable LoRA for text encoder enable LoRA for U-Net prepare optimizer, data loader etc. use AdamW optimizer | {} override steps. steps for 50 epochs is / 指定エポックまでのステップ数: 1800 running training / 学習開始 num train images repeats / 学習画像の数×繰り返し回数: 100 num reg images / 正則化画像の数: 0 num batches per epoch / 1epochのバッチ数: 36 num epochs / epoch数: 50 batch size per device / バッチサイズ: 3 gradient accumulation steps / 勾配を合計するステップ数 = 1 total optimization steps / 学習ステップ数: 1800 steps: 0%| | 0/1800 [00:00<?, ?it/s] epoch 1/50 steps: 0%|▏ | 5/1800 [02:53<17:19:23, 34.74s/it, avr_loss=0.071]05:12:47-824560 INFO Loading config... 05:12:47-995589 INFO runwayml/stable-diffusion-v1-5 model selected. 05:12:51-632229 INFO The running process has been terminated. 05:12:53-517059 INFO There is no running process to kill. 05:12:53-673588 INFO There is no running process to kill. 05:13:40-808375 INFO Start training LoRA Standard ... 05:13:40-809875 INFO Checking for duplicate image filenames in training data directory... Warning: Same filename 'Merial (1)' with different image extension found. This will cause training issues. Rename one of the file. Existing file: E:\GitHub\training\Merial\10_Merial\Merial (1).jpg Current file: E:\GitHub\training\Merial\10_Merial\Merial (1).webp 05:13:40-812875 INFO Valid image folder names found in: E:\GitHub\training\Merial 05:13:40-813874 INFO Folder 10_Merial: 10 images found 05:13:40-816875 INFO Folder 10_Merial: 100 steps 05:13:40-817875 INFO Total steps: 100 05:13:40-818875 INFO Train batch size: 2 05:13:40-819876 INFO Gradient accumulation steps: 1 05:13:40-820877 INFO Epoch: 15 05:13:40-822376 INFO Regulatization factor: 1 05:13:40-823376 INFO max_train_steps (100 / 2 / 1 15 * 1) = 750 05:13:40-824878 INFO stop_text_encoder_training = 0 05:13:40-826377 INFO lr_warmup_steps = 0 05:13:40-827377 INFO Saving training config to C:\Users\josep\Desktop\Lora\Merial13_20240108-051340.json... 05:13:40-828878 INFO accelerate launch --num_cpu_threads_per_process=2 "./train_network.py" --enable_bucket --min_bucket_reso=256 --max_bucket_reso=2048 --pretrained_model_name_or_path="runwayml/stable-diffusion-v1-5" --train_data_dir="E:\GitHub\training\Merial" --resolution="512,512" --output_dir="C:\Users\josep\Desktop\Lora" --logging_dir="C:\Users\josep\Desktop\Lora" --network_alpha="128" --save_model_as=safetensors --network_module=networks.lora --text_encoder_lr=5e-05 --unet_lr=0.00015 --network_dim=128 --output_name="Merial13" --lr_scheduler_num_cycles="15" --no_half_vae --learning_rate="0.00015" --lr_scheduler="cosine" --train_batch_size="2" --max_train_steps="750" --save_every_n_epochs="5" --mixed_precision="bf16" --save_precision="bf16" --caption_extension=".txt" --cache_latents --optimizer_type="AdamW8bit" --max_grad_norm="1" --max_data_loader_n_workers="0" --max_token_length=225 --clip_skip=2 --bucket_reso_steps=64 --xformers --bucket_no_upscale --noise_offset=0.0 prepare tokenizer update token length: 225 Using DreamBooth method. prepare images. found directory E:\GitHub\training\Merial\10_Merial contains 10 image files 100 train images with repeating. 0 reg images. no regularization images / 正則化画像が見つかりませんでした [Dataset 0] batch_size: 2 resolution: (512, 512) enable_bucket: True min_bucket_reso: 256 max_bucket_reso: 2048 bucket_reso_steps: 64 bucket_no_upscale: True

[Subset 0 of Dataset 0] image_dir: "E:\GitHub\training\Merial\10_Merial" image_count: 10 num_repeats: 10 shuffle_caption: False keep_tokens: 0 keep_tokens_separator: caption_dropout_rate: 0.0 caption_dropout_every_n_epoches: 0 caption_tag_dropout_rate: 0.0 caption_prefix: None caption_suffix: None color_aug: False flip_aug: False face_crop_aug_range: None random_crop: False token_warmup_min: 1, token_warmup_step: 0, is_reg: False class_tokens: Merial caption_extension: .txt

[Dataset 0] loading image sizes. 100%|███████████████████████████████████████████████████████████████████████████| 10/10 [00:00<00:00, 139.83it/s] make buckets min_bucket_reso and max_bucket_reso are ignored if bucket_no_upscale is set, because bucket reso is defined by image size automatically / bucket_no_upscaleが指定された場合は、bucketの解像度は画像サイズから自動計算されるため、min_bucket_resoとmax_bucket_resoは無視されます number of images (including repeats) / 各bucketの画像枚数（繰り返し回数を含む） bucket 0: resolution (128, 256), count: 20 bucket 1: resolution (192, 384), count: 10 bucket 2: resolution (256, 384), count: 10 bucket 3: resolution (256, 576), count: 20 bucket 4: resolution (320, 512), count: 10 bucket 5: resolution (384, 512), count: 10 bucket 6: resolution (384, 576), count: 10 bucket 7: resolution (512, 512), count: 10 mean ar error (without repeats): 0.06391376824093384 preparing accelerator loading model for process 0/1 load Diffusers pretrained models: runwayml/stable-diffusion-v1-5 Loading pipeline components...: 100%|██████████████████████████████████████████████| 5/5 [00:00<00:00, 7.01it/s] You have disabled the safety checker for <class 'diffusers.pipelines.stable_diffusion.pipeline_stable_diffusion.StableDiffusionPipeline'> by passing safety_checker=None. Ensure that you abide to the conditions of the Stable Diffusion license and do not expose unfiltered results in services or applications open to the public. Both the diffusers team and Hugging Face strongly recommend to keep the safety filter enabled in all public facing circumstances, disabling it only for use-cases that involve analyzing network behavior or auditing its results. For more information, please have a look at huggingface/diffusers#254 . UNet2DConditionModel: 64, 8, 768, False, False U-Net converted to original U-Net Enable xformers for U-Net A matching Triton is not available, some optimizations will not be enabled. Error caught was: No module named 'triton' import network module: networks.lora [Dataset 0] caching latents. checking cache validity... 100%|████████████████████████████████████████████████████████████████████████████████████| 10/10 [00:00<?, ?it/s] caching latents... 100%|████████████████████████████████████████████████████████████████████████████| 10/10 [00:00<00:00, 13.69it/s] create LoRA network. base dim (rank): 128, alpha: 128.0 neuron dropout: p=None, rank dropout: p=None, module dropout: p=None create LoRA for Text Encoder: create LoRA for Text Encoder: 72 modules. create LoRA for U-Net: 192 modules. enable LoRA for text encoder enable LoRA for U-Net prepare optimizer, data loader etc. Traceback (most recent call last): File "E:\GitHub\kohya_ss\library\train_util.py", line 3480, in get_optimizer import bitsandbytes as bnb File "E:\GitHub\kohya_ss\venv\lib\site-packages\bitsandbytesinit.py", line 6, in from . import cuda_setup, utils, research File "E:\GitHub\kohya_ss\venv\lib\site-packages\bitsandbytes\researchinit.py", line 1, in from . import nn File "E:\GitHub\kohya_ss\venv\lib\site-packages\bitsandbytes\research\nninit.py", line 1, in from .modules import LinearFP8Mixed, LinearFP8Global File "E:\GitHub\kohya_ss\venv\lib\site-packages\bitsandbytes\research\nn\modules.py", line 8, in from bitsandbytes.optim import GlobalOptimManager File "E:\GitHub\kohya_ss\venv\lib\site-packages\bitsandbytes\optiminit.py", line 6, in from bitsandbytes.cextension import COMPILED_WITH_CUDA File "E:\GitHub\kohya_ss\venv\lib\site-packages\bitsandbytes\cextension.py", line 5, in from .cuda_setup.main import evaluate_cuda_setup File "E:\GitHub\kohya_ss\venv\lib\site-packages\bitsandbytes\cuda_setup\main.py", line 21, in from .paths import determine_cuda_runtime_lib_path ModuleNotFoundError: No module named 'bitsandbytes.cuda_setup.paths'

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "E:\GitHub\kohya_ss\train_network.py", line 996, in trainer.train(args) File "E:\GitHub\kohya_ss\train_network.py", line 348, in train optimizer_name, optimizer_args, optimizer = train_util.get_optimizer(args, trainable_params) File "E:\GitHub\kohya_ss\library\train_util.py", line 3482, in get_optimizer raise ImportError("No bitsandbytes / bitsandbytesがインストールされていないようです") ImportError: No bitsandbytes / bitsandbytesがインストールされていないようです Traceback (most recent call last): File "C:\Users\josep\AppData\Local\Programs\Python\Python310\lib\runpy.py", line 196, in _run_module_as_main return _run_code(code, main_globals, None, File "C:\Users\josep\AppData\Local\Programs\Python\Python310\lib\runpy.py", line 86, in _run_code exec(code, run_globals) File "E:\GitHub\kohya_ss\venv\Scripts\accelerate.exemain.py", line 7, in File "E:\GitHub\kohya_ss\venv\lib\site-packages\accelerate\commands\accelerate_cli.py", line 47, in main args.func(args) File "E:\GitHub\kohya_ss\venv\lib\site-packages\accelerate\commands\launch.py", line 986, in launch_command simple_launcher(args) File "E:\GitHub\kohya_ss\venv\lib\site-packages\accelerate\commands\launch.py", line 628, in simple_launcher raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd) subprocess.CalledProcessError: Command '['E:\GitHub\kohya_ss\venv\Scripts\python.exe', './train_network.py', '--enable_bucket', '--min_bucket_reso=256', '--max_bucket_reso=2048', '--pretrained_model_name_or_path=runwayml/stable-diffusion-v1-5', '--train_data_dir=E:\GitHub\training\Merial', '--resolution=512,512', '--output_dir=C:\Users\josep\Desktop\Lora', '--logging_dir=C:\Users\josep\Desktop\Lora', '--network_alpha=128', '--save_model_as=safetensors', '--network_module=networks.lora', '--text_encoder_lr=5e-05', '--unet_lr=0.00015', '--network_dim=128', '--output_name=Merial13', '--lr_scheduler_num_cycles=15', '--no_half_vae', '--learning_rate=0.00015', '--lr_scheduler=cosine', '--train_batch_size=2', '--max_train_steps=750', '--save_every_n_epochs=5', '--mixed_precision=bf16', '--save_precision=bf16', '--caption_extension=.txt', '--cache_latents', '--optimizer_type=AdamW8bit', '--max_grad_norm=1', '--max_data_loader_n_workers=0', '--max_token_length=225', '--clip_skip=2', '--bucket_reso_steps=64', '--xformers', '--bucket_no_upscale', '--noise_offset=0.0']' returned non-zero exit status 1. 05:15:35-218988 INFO Start training LoRA Standard ... 05:15:35-219988 INFO Checking for duplicate image filenames in training data directory... Warning: Same filename 'Merial (1)' with different image extension found. This will cause training issues. Rename one of the file. Existing file: E:\GitHub\training\Merial\10_Merial\Merial (1).jpg Current file: E:\GitHub\training\Merial\10_Merial\Merial (1).webp 05:15:35-224490 INFO Valid image folder names found in: E:\GitHub\training\Merial 05:15:35-225989 INFO Folder 10_Merial: 10 images found 05:15:35-226990 INFO Folder 10_Merial: 100 steps 05:15:35-227990 INFO Total steps: 100 05:15:35-228491 INFO Train batch size: 2 05:15:35-229990 INFO Gradient accumulation steps: 1 05:15:35-230989 INFO Epoch: 15 05:15:35-231990 INFO Regulatization factor: 1 05:15:35-234490 INFO max_train_steps (100 / 2 / 1 15 1) = 750 05:15:35-235491 INFO stop_text_encoder_training = 0 05:15:35-236491 INFO lr_warmup_steps = 0 05:15:35-237991 INFO Saving training config to C:\Users\josep\Desktop\Lora\Merial13_20240108-051535.json... 05:15:35-239492 INFO accelerate launch --num_cpu_threads_per_process=2 "./train_network.py" --enable_bucket --min_bucket_reso=256 --max_bucket_reso=2048 --pretrained_model_name_or_path="E:/GitHub/stable-diffusion-webui/models/Stable-diffusion /sd_xl_base_1.0.safetensors" --train_data_dir="E:\GitHub\training\Merial" --resolution="512,512" --output_dir="C:\Users\josep\Desktop\Lora" --logging_dir="C:\Users\josep\Desktop\Lora" --network_alpha="128" --save_model_as=safetensors --network_module=networks.lora --text_encoder_lr=5e-05 --unet_lr=0.00015 --network_dim=128 --output_name="Merial13" --lr_scheduler_num_cycles="15" --no_half_vae --learning_rate="0.00015" --lr_scheduler="cosine" --train_batch_size="2" --max_train_steps="750" --save_every_n_epochs="5" --mixed_precision="bf16" --save_precision="bf16" --caption_extension=".txt" --cache_latents --optimizer_type="AdamW8bit" --max_grad_norm="1" --max_data_loader_n_workers="0" --max_token_length=225 --clip_skip=2 --bucket_reso_steps=64 --xformers --bucket_no_upscale --noise_offset=0.0 prepare tokenizer update token length: 225 Using DreamBooth method. prepare images. found directory E:\GitHub\training\Merial\10_Merial contains 10 image files 100 train images with repeating. 0 reg images. no regularization images / 正則化画像が見つかりませんでした [Dataset 0] batch_size: 2 resolution: (512, 512) enable_bucket: True min_bucket_reso: 256 max_bucket_reso: 2048 bucket_reso_steps: 64 bucket_no_upscale: True

[Subset 0 of Dataset 0] image_dir: "E:\GitHub\training\Merial\10_Merial" image_count: 10 num_repeats: 10 shuffle_caption: False keep_tokens: 0 keep_tokens_separator: caption_dropout_rate: 0.0 caption_dropout_every_n_epoches: 0 caption_tag_dropout_rate: 0.0 caption_prefix: None caption_suffix: None color_aug: False flip_aug: False face_crop_aug_range: None random_crop: False token_warmup_min: 1, token_warmup_step: 0, is_reg: False class_tokens: Merial caption_extension: .txt

[Dataset 0] loading image sizes. 100%|███████████████████████████████████████████████████████████████████████████| 10/10 [00:00<00:00, 150.35it/s] make buckets min_bucket_reso and max_bucket_reso are ignored if bucket_no_upscale is set, because bucket reso is defined by image size automatically / bucket_no_upscaleが指定された場合は、bucketの解像度は画像サイズから自動計算されるため、min_bucket_resoとmax_bucket_resoは無視されます number of images (including repeats) / 各bucketの画像枚数（繰り返し回数を含む） bucket 0: resolution (128, 256), count: 20 bucket 1: resolution (192, 384), count: 10 bucket 2: resolution (256, 384), count: 10 bucket 3: resolution (256, 576), count: 20 bucket 4: resolution (320, 512), count: 10 bucket 5: resolution (384, 512), count: 10 bucket 6: resolution (384, 576), count: 10 bucket 7: resolution (512, 512), count: 10 mean ar error (without repeats): 0.06391376824093384 preparing accelerator loading model for process 0/1 load StableDiffusion checkpoint: E:/GitHub/stable-diffusion-webui/models/Stable-diffusion/sd_xl_base_1.0.safetensors UNet2DConditionModel: 64, 8, 768, False, False Traceback (most recent call last): File "E:\GitHub\kohya_ss\train_network.py", line 996, in trainer.train(args) File "E:\GitHub\kohya_ss\train_network.py", line 234, in train model_version, text_encoder, vae, unet = self.load_target_model(args, weight_dtype, accelerator) File "E:\GitHub\kohya_ss\train_network.py", line 103, in load_target_model textencoder, vae, unet, = train_util.load_target_model(args, weight_dtype, accelerator) File "E:\GitHub\kohya_ss\library\train_util.py", line 3960, in load_target_model text_encoder, vae, unet, load_stable_diffusion_format = _load_target_model( File "E:\GitHub\kohya_ss\library\train_util.py", line 3914, in _load_target_model text_encoder, vae, unet = model_util.load_models_from_stable_diffusion_checkpoint( File "E:\GitHub\kohya_ss\library\model_util.py", line 1007, in load_models_from_stable_diffusion_checkpoint info = unet.load_state_dict(converted_unet_checkpoint) File "E:\GitHub\kohya_ss\venv\lib\site-packages\torch\nn\modules\module.py", line 2041, in load_state_dict raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format( RuntimeError: Error(s) in loading state_dict for UNet2DConditionModel: Missing key(s) in state_dict: "down_blocks.0.attentions.0.norm.weight", "down_blocks.0.attentions.0.norm.bias", "down_blocks.0.attentions.0.proj_in.weight", "down_blocks.0.attentions.0.proj_in.bias", "down_blocks.0.attentions.0.transformer_blocks.0.attn1.to_q.weight", "down_blocks.0.attentions.0.transformer_blocks.0.attn1.to_k.weight", "down_blocks.0.attentions.0.transformer_blocks.0.attn1.to_v.weight", "down_blocks.0.attentions.0.transformer_blocks.0.attn1.to_out.0.weight", "down_blocks.0.attentions.0.transformer_blocks.0.attn1.to_out.0.bias", "down_blocks.0.attentions.0.transformer_blocks.0.ff.net.0.proj.weight", "down_blocks.0.attentions.0.transformer_blocks.0.ff.net.0.proj.bias", "down_blocks.0.attentions.0.transformer_blocks.0.ff.net.2.weight", "down_blocks.0.attentions.0.transformer_blocks.0.ff.net.2.bias", "down_blocks.0.attentions.0.transformer_blocks.0.attn2.to_q.weight", "down_blocks.0.attentions.0.transformer_blocks.0.attn2.to_k.weight", "down_blocks.0.attentions.0.transformer_blocks.0.attn2.to_v.weight", "down_blocks.0.attentions.0.transformer_blocks.0.attn2.to_out.0.weight", "down_blocks.0.attentions.0.transformer_blocks.0.attn2.to_out.0.bias", "down_blocks.0.attentions.0.transformer_blocks.0.norm1.weight", "down_blocks.0.attentions.0.transformer_blocks.0.norm1.bias", "down_blocks.0.attentions.0.transformer_blocks.0.norm2.weight", "down_blocks.0.attentions.0.transformer_blocks.0.norm2.bias", "down_blocks.0.attentions.0.transformer_blocks.0.norm3.weight", "down_blocks.0.attentions.0.transformer_blocks.0.norm3.bias", "down_blocks.0.attentions.0.proj_out.weight", "down_blocks.0.attentions.0.proj_out.bias", "down_blocks.0.attentions.1.norm.weight", "down_blocks.0.attentions.1.norm.bias", "down_blocks.0.attentions.1.proj_in.weight", "down_blocks.0.attentions.1.proj_in.bias", "down_blocks.0.attentions.1.transformer_blocks.0.attn1.to_q.weight", "down_blocks.0.attentions.1.transformer_blocks.0.attn1.to_k.weight", "down_blocks.0.attentions.1.transformer_blocks.0.attn1.to_v.weight", "down_blocks.0.attentions.1.transformer_blocks.0.attn1.to_out.0.weight", "down_blocks.0.attentions.1.transformer_blocks.0.attn1.to_out.0.bias", "down_blocks.0.attentions.1.transformer_blocks.0.ff.net.0.proj.weight", "down_blocks.0.attentions.1.transformer_blocks.0.ff.net.0.proj.bias", "down_blocks.0.attentions.1.transformer_blocks.0.ff.net.2.weight", "down_blocks.0.attentions.1.transformer_blocks.0.ff.net.2.bias", "down_blocks.0.attentions.1.transformer_blocks.0.attn2.to_q.weight", "down_blocks.0.attentions.1.transformer_blocks.0.attn2.to_k.weight", "down_blocks.0.attentions.1.transformer_blocks.0.attn2.to_v.weight", "down_blocks.0.attentions.1.transformer_blocks.0.attn2.to_out.0.weight", "down_blocks.0.attentions.1.transformer_blocks.0.attn2.to_out.0.bias", "down_blocks.0.attentions.1.transformer_blocks.0.norm1.weight", "down_blocks.0.attentions.1.transformer_blocks.0.norm1.bias", "down_blocks.0.attentions.1.transformer_blocks.0.norm2.weight", "down_blocks.0.attentions.1.transformer_blocks.0.norm2.bias", "down_blocks.0.attentions.1.transformer_blocks.0.norm3.weight", "down_blocks.0.attentions.1.transformer_blocks.0.norm3.bias", "down_blocks.0.attentions.1.proj_out.weight", "down_blocks.0.attentions.1.proj_out.bias", "down_blocks.2.downsamplers.0.conv.weight", "down_blocks.2.downsamplers.0.conv.bias", "down_blocks.3.resnets.0.norm1.weight", "down_blocks.3.resnets.0.norm1.bias", "down_blocks.3.resnets.0.conv1.weight", "down_blocks.3.resnets.0.conv1.bias", "down_blocks.3.resnets.0.time_emb_proj.weight", "down_blocks.3.resnets.0.time_emb_proj.bias", "down_blocks.3.resnets.0.norm2.weight", "down_blocks.3.resnets.0.norm2.bias", "down_blocks.3.resnets.0.conv2.weight", "down_blocks.3.resnets.0.conv2.bias", "down_blocks.3.resnets.1.norm1.weight", "down_blocks.3.resnets.1.norm1.bias", "down_blocks.3.resnets.1.conv1.weight", "down_blocks.3.resnets.1.conv1.bias", "down_blocks.3.resnets.1.time_emb_proj.weight", "down_blocks.3.resnets.1.time_emb_proj.bias", "down_blocks.3.resnets.1.norm2.weight", "down_blocks.3.resnets.1.norm2.bias", "down_blocks.3.resnets.1.conv2.weight", "down_blocks.3.resnets.1.conv2.bias", "up_blocks.2.attentions.0.norm.weight", "up_blocks.2.attentions.0.norm.bias", "up_blocks.2.attentions.0.proj_in.weight", "up_blocks.2.attentions.0.proj_in.bias", "up_blocks.2.attentions.0.transformer_blocks.0.attn1.to_q.weight", "up_blocks.2.attentions.0.transformer_blocks.0.attn1.to_k.weight", "up_blocks.2.attentions.0.transformer_blocks.0.attn1.to_v.weight", "up_blocks.2.attentions.0.transformer_blocks.0.attn1.to_out.0.weight", "up_blocks.2.attentions.0.transformer_blocks.0.attn1.to_out.0.bias", "up_blocks.2.attentions.0.transformer_blocks.0.ff.net.0.proj.weight", "up_blocks.2.attentions.0.transformer_blocks.0.ff.net.0.proj.bias", "up_blocks.2.attentions.0.transformer_blocks.0.ff.net.2.weight", "up_blocks.2.attentions.0.transformer_blocks.0.ff.net.2.bias", "up_blocks.2.attentions.0.transformer_blocks.0.attn2.to_q.weight", "up_blocks.2.attentions.0.transformer_blocks.0.attn2.to_k.weight", "up_blocks.2.attentions.0.transformer_blocks.0.attn2.to_v.weight", "up_blocks.2.attentions.0.transformer_blocks.0.attn2.to_out.0.weight", "up_blocks.2.attentions.0.transformer_blocks.0.attn2.to_out.0.bias", "up_blocks.2.attentions.0.transformer_blocks.0.norm1.weight", "up_blocks.2.attentions.0.transformer_blocks.0.norm1.bias", "up_blocks.2.attentions.0.transformer_blocks.0.norm2.weight", "up_blocks.2.attentions.0.transformer_blocks.0.norm2.bias", "up_blocks.2.attentions.0.transformer_blocks.0.norm3.weight", "up_blocks.2.attentions.0.transformer_blocks.0.norm3.bias", "up_blocks.2.attentions.0.proj_out.weight", "up_blocks.2.attentions.0.proj_out.bias", "up_blocks.2.attentions.1.norm.weight", "up_blocks.2.attentions.1.norm.bias", "up_blocks.2.attentions.1.proj_in.weight", "up_blocks.2.attentions.1.proj_in.bias", "up_blocks.2.attentions.1.transformer_blocks.0.attn1.to_q.weight", "up_blocks.2.attentions.1.transformer_blocks.0.attn1.to_k.weight", "up_blocks.2.attentions.1.transformer_blocks.0.attn1.to_v.weight", "up_blocks.2.attentions.1.transformer_blocks.0.attn1.to_out.0.weight", "up_blocks.2.attentions.1.transformer_blocks.0.attn1.to_out.0.bias", "up_blocks.2.attentions.1.transformer_blocks.0.ff.net.0.proj.weight", "up_blocks.2.attentions.1.transformer_blocks.0.ff.net.0.proj.bias", "up_blocks.2.attentions.1.transformer_blocks.0.ff.net.2.weight", "up_blocks.2.attentions.1.transformer_blocks.0.ff.net.2.bias", "up_blocks.2.attentions.1.transformer_blocks.0.attn2.to_q.weight", "up_blocks.2.attentions.1.transformer_blocks.0.attn2.to_k.weight", "up_blocks.2.attentions.1.transformer_blocks.0.attn2.to_v.weight", "up_blocks.2.attentions.1.transformer_blocks.0.attn2.to_out.0.weight", "up_blocks.2.attentions.1.transformer_blocks.0.attn2.to_out.0.bias", "up_blocks.2.attentions.1.transformer_blocks.0.norm1.weight", "up_blocks.2.attentions.1.transformer_blocks.0.norm1.bias", "up_blocks.2.attentions.1.transformer_blocks.0.norm2.weight", "up_blocks.2.attentions.1.transformer_blocks.0.norm2.bias", "up_blocks.2.attentions.1.transformer_blocks.0.norm3.weight", "up_blocks.2.attentions.1.transformer_blocks.0.norm3.bias", "up_blocks.2.attentions.1.proj_out.weight", "up_blocks.2.attentions.1.proj_out.bias", "up_blocks.2.attentions.2.norm.weight", "up_blocks.2.attentions.2.norm.bias", "up_blocks.2.attentions.2.proj_in.weight", "up_blocks.2.attentions.2.proj_in.bias", "up_blocks.2.attentions.2.transformer_blocks.0.attn1.to_q.weight", "up_blocks.2.attentions.2.transformer_blocks.0.attn1.to_k.weight", "up_blocks.2.attentions.2.transformer_blocks.0.attn1.to_v.weight", "up_blocks.2.attentions.2.transformer_blocks.0.attn1.to_out.0.weight", "up_blocks.2.attentions.2.transformer_blocks.0.attn1.to_out.0.bias", "up_blocks.2.attentions.2.transformer_blocks.0.ff.net.0.proj.weight", "up_blocks.2.attentions.2.transformer_blocks.0.ff.net.0.proj.bias", "up_blocks.2.attentions.2.transformer_blocks.0.ff.net.2.weight", "up_blocks.2.attentions.2.transformer_blocks.0.ff.net.2.bias", "up_blocks.2.attentions.2.transformer_blocks.0.attn2.to_q.weight", "up_blocks.2.attentions.2.transformer_blocks.0.attn2.to_k.weight", "up_blocks.2.attentions.2.transformer_blocks.0.attn2.to_v.weight", "up_blocks.2.attentions.2.transformer_blocks.0.attn2.to_out.0.weight", "up_blocks.2.attentions.2.transformer_blocks.0.attn2.to_out.0.bias", "up_blocks.2.attentions.2.transformer_blocks.0.norm1.weight", "up_blocks.2.attentions.2.transformer_blocks.0.norm1.bias", "up_blocks.2.attentions.2.transformer_blocks.0.norm2.weight", "up_blocks.2.attentions.2.transformer_blocks.0.norm2.bias", "up_blocks.2.attentions.2.transformer_blocks.0.norm3.weight", "up_blocks.2.attentions.2.transformer_blocks.0.norm3.bias", "up_blocks.2.attentions.2.proj_out.weight", "up_blocks.2.attentions.2.proj_out.bias", "up_blocks.2.upsamplers.0.conv.weight", "up_blocks.2.upsamplers.0.conv.bias", "up_blocks.3.attentions.0.norm.weight", "up_blocks.3.attentions.0.norm.bias", "up_blocks.3.attentions.0.proj_in.weight", "up_blocks.3.attentions.0.proj_in.bias", "up_blocks.3.attentions.0.transformer_blocks.0.attn1.to_q.weight", "up_blocks.3.attentions.0.transformer_blocks.0.attn1.to_k.weight", "up_blocks.3.attentions.0.transformer_blocks.0.attn1.to_v.weight", "up_blocks.3.attentions.0.transformer_blocks.0.attn1.to_out.0.weight", "up_blocks.3.attentions.0.transformer_blocks.0.attn1.to_out.0.bias", "up_blocks.3.attentions.0.transformer_blocks.0.ff.net.0.proj.weight", "up_blocks.3.attentions.0.transformer_blocks.0.ff.net.0.proj.bias", "up_blocks.3.attentions.0.transformer_blocks.0.ff.net.2.weight", "up_blocks.3.attentions.0.transformer_blocks.0.ff.net.2.bias", "up_blocks.3.attentions.0.transformer_blocks.0.attn2.to_q.weight", "up_blocks.3.attentions.0.transformer_blocks.0.attn2.to_k.weight", "up_blocks.3.attentions.0.transformer_blocks.0.attn2.to_v.weight", "up_blocks.3.attentions.0.transformer_blocks.0.attn2.to_out.0.weight", "up_blocks.3.attentions.0.transformer_blocks.0.attn2.to_out.0.bias", "up_blocks.3.attentions.0.transformer_blocks.0.norm1.weight", "up_blocks.3.attentions.0.transformer_blocks.0.norm1.bias", "up_blocks.3.attentions.0.transformer_blocks.0.norm2.weight", "up_blocks.3.attentions.0.transformer_blocks.0.norm2.bias", "up_blocks.3.attentions.0.transformer_blocks.0.norm3.weight", "up_blocks.3.attentions.0.transformer_blocks.0.norm3.bias", "up_blocks.3.attentions.0.proj_out.weight", "up_blocks.3.attentions.0.proj_out.bias", "up_blocks.3.attentions.1.norm.weight", "up_blocks.3.attentions.1.norm.bias", "up_blocks.3.attentions.1.proj_in.weight", "up_blocks.3.attentions.1.proj_in.bias", "up_blocks.3.attentions.1.transformer_blocks.0.attn1.to_q.weight", "up_blocks.3.attentions.1.transformer_blocks.0.attn1.to_k.weight", "up_blocks.3.attentions.1.transformer_blocks.0.attn1.to_v.weight", "up_blocks.3.attentions.1.transformer_blocks.0.attn1.to_out.0.weight", "up_blocks.3.attentions.1.transformer_blocks.0.attn1.to_out.0.bias", "up_blocks.3.attentions.1.transformer_blocks.0.ff.net.0.proj.weight", "up_blocks.3.attentions.1.transformer_blocks.0.ff.net.0.proj.bias", "up_blocks.3.attentions.1.transformer_blocks.0.ff.net.2.weight", "up_blocks.3.attentions.1.transformer_blocks.0.ff.net.2.bias", "up_blocks.3.attentions.1.transformer_blocks.0.attn2.to_q.weight", "up_blocks.3.attentions.1.transformer_blocks.0.attn2.to_k.weight", "up_blocks.3.attentions.1.transformer_blocks.0.attn2.to_v.weight", "up_blocks.3.attentions.1.transformer_blocks.0.attn2.to_out.0.weight", "up_blocks.3.attentions.1.transformer_blocks.0.attn2.to_out.0.bias", "up_blocks.3.attentions.1.transformer_blocks.0.norm1.weight", "up_blocks.3.attentions.1.transformer_blocks.0.norm1.bias", "up_blocks.3.attentions.1.transformer_blocks.0.norm2.weight", "up_blocks.3.attentions.1.transformer_blocks.0.norm2.bias", "up_blocks.3.attentions.1.transformer_blocks.0.norm3.weight", "up_blocks.3.attentions.1.transformer_blocks.0.norm3.bias", "up_blocks.3.attentions.1.proj_out.weight", "up_blocks.3.attentions.1.proj_out.bias", "up_blocks.3.attentions.2.norm.weight", "up_blocks.3.attentions.2.norm.bias", "up_blocks.3.attentions.2.proj_in.weight", "up_blocks.3.attentions.2.proj_in.bias", "up_blocks.3.attentions.2.transformer_blocks.0.attn1.to_q.weight", "up_blocks.3.attentions.2.transformer_blocks.0.attn1.to_k.weight", "up_blocks.3.attentions.2.transformer_blocks.0.attn1.to_v.weight", "up_blocks.3.attentions.2.transformer_blocks.0.attn1.to_out.0.weight", "up_blocks.3.attentions.2.transformer_blocks.0.attn1.to_out.0.bias", "up_blocks.3.attentions.2.transformer_blocks.0.ff.net.0.proj.weight", "up_blocks.3.attentions.2.transformer_blocks.0.ff.net.0.proj.bias", "up_blocks.3.attentions.2.transformer_blocks.0.ff.net.2.weight", "up_blocks.3.attentions.2.transformer_blocks.0.ff.net.2.bias", "up_blocks.3.attentions.2.transformer_blocks.0.attn2.to_q.weight", "up_blocks.3.attentions.2.transformer_blocks.0.attn2.to_k.weight", "up_blocks.3.attentions.2.transformer_blocks.0.attn2.to_v.weight", "up_blocks.3.attentions.2.transformer_blocks.0.attn2.to_out.0.weight", "up_blocks.3.attentions.2.transformer_blocks.0.attn2.to_out.0.bias", "up_blocks.3.attentions.2.transformer_blocks.0.norm1.weight", "up_blocks.3.attentions.2.transformer_blocks.0.norm1.bias", "up_blocks.3.attentions.2.transformer_blocks.0.norm2.weight", "up_blocks.3.attentions.2.transformer_blocks.0.norm2.bias", "up_blocks.3.attentions.2.transformer_blocks.0.norm3.weight", "up_blocks.3.attentions.2.transformer_blocks.0.norm3.bias", "up_blocks.3.attentions.2.proj_out.weight", "up_blocks.3.attentions.2.proj_out.bias", "up_blocks.3.resnets.0.norm1.weight", "up_blocks.3.resnets.0.norm1.bias", "up_blocks.3.resnets.0.conv1.weight", "up_blocks.3.resnets.0.conv1.bias", "up_blocks.3.resnets.0.time_emb_proj.weight", "up_blocks.3.resnets.0.time_emb_proj.bias", "up_blocks.3.resnets.0.norm2.weight", "up_blocks.3.resnets.0.norm2.bias", "up_blocks.3.resnets.0.conv2.weight", "up_blocks.3.resnets.0.conv2.bias", "up_blocks.3.resnets.0.conv_shortcut.weight", "up_blocks.3.resnets.0.conv_shortcut.bias", "up_blocks.3.resnets.1.norm1.weight", "up_blocks.3.resnets.1.norm1.bias", "up_blocks.3.resnets.1.conv1.weight", "up_blocks.3.resnets.1.conv1.bias", "up_blocks.3.resnets.1.time_emb_proj.weight", "up_blocks.3.resnets.1.time_emb_proj.bias", "up_blocks.3.resnets.1.norm2.weight", "up_blocks.3.resnets.1.norm2.bias", "up_blocks.3.resnets.1.conv2.weight", "up_blocks.3.resnets.1.conv2.bias", "up_blocks.3.resnets.1.conv_shortcut.weight", "up_blocks.3.resnets.1.conv_shortcut.bias", "up_blocks.3.resnets.2.norm1.weight", "up_blocks.3.resnets.2.norm1.bias", "up_blocks.3.resnets.2.conv1.weight", "up_blocks.3.resnets.2.conv1.bias", "up_blocks.3.resnets.2.time_emb_proj.weight", "up_blocks.3.resnets.2.time_emb_proj.bias", "up_blocks.3.resnets.2.norm2.weight", "up_blocks.3.resnets.2.norm2.bias", "up_blocks.3.resnets.2.conv2.weight", "up_blocks.3.resnets.2.conv2.bias", "up_blocks.3.resnets.2.conv_shortcut.weight", "up_blocks.3.resnets.2.conv_shortcut.bias". Unexpected key(s) in state_dict: "down_blocks.1.attentions.0.transformer_blocks.1.attn1.to_k.weight", "down_blocks.1.attentions.0.transformer_blocks.1.attn1.to_out.0.bias", "down_blocks.1.attentions.0.transformer_blocks.1.attn1.to_out.0.weight", "down_blocks.1.attentions.0.transformer_blocks.1.attn1.to_q.weight", "down_blocks.1.attentions.0.transformer_blocks.1.attn1.to_v.weight", "down_blocks.1.attentions.0.transformer_blocks.1.attn2.to_k.weight", "down_blocks.1.attentions.0.transformer_blocks.1.attn2.to_out.0.bias", "down_blocks.1.attentions.0.transformer_blocks.1.attn2.to_out.0.weight", "down_blocks.1.attentions.0.transformer_blocks.1.attn2.to_q.weight", "down_blocks.1.attentions.0.transformer_blocks.1.attn2.to_v.weight", "down_blocks.1.attentions.0.transformer_blocks.1.ff.net.0.proj.bias", "down_blocks.1.attentions.0.transformer_blocks.1.ff.net.0.proj.weight", "down_blocks.1.attentions.0.transformer_blocks.1.ff.net.2.bias", "down_blocks.1.attentions.0.transformer_blocks.1.ff.net.2.weight", "down_blocks.1.attentions.0.transformer_blocks.1.norm1.bias", "down_blocks.1.attentions.0.transformer_blocks.1.norm1.weight", "down_blocks.1.attentions.0.transformer_blocks.1.norm2.bias", "down_blocks.1.attentions.0.transformer_blocks.1.norm2.weight", "down_blocks.1.attentions.0.transformer_blocks.1.norm3.bias", "down_blocks.1.attentions.0.transformer_blocks.1.norm3.weight", "down_blocks.1.attentions.1.transformer_blocks.1.attn1.to_k.weight", "down_blocks.1.attentions.1.transformer_blocks.1.attn1.to_out.0.bias", "down_blocks.1.attentions.1.transformer_blocks.1.attn1.to_out.0.weight", "down_blocks.1.attentions.1.transformer_blocks.1.attn1.to_q.weight", "down_blocks.1.attentions.1.transformer_blocks.1.attn1.to_v.weight", "down_blocks.1.attentions.1.transformer_blocks.1.attn2.to_k.weight", "down_blocks.1.attentions.1.transformer_blocks.1.attn2.to_out.0.bias", "down_blocks.1.attentions.1.transformer_blocks.1.attn2.to_out.0.weight", "down_blocks.1.attentions.1.transformer_blocks.1.attn2.to_q.weight", "down_blocks.1.attentions.1.transformer_blocks.1.attn2.to_v.weight", "down_blocks.1.attentions.1.transformer_blocks.1.ff.net.0.proj.bias", "down_blocks.1.attentions.1.transformer_blocks.1.ff.net.0.proj.weight", "down_blocks.1.attentions.1.transformer_blocks.1.ff.net.2.bias", "down_blocks.1.attentions.1.transformer_blocks.1.ff.net.2.weight", "down_blocks.1.attentions.1.transformer_blocks.1.norm1.bias", "down_blocks.1.attentions.1.transformer_blocks.1.norm1.weight", "down_blocks.1.attentions.1.transformer_blocks.1.norm2.bias", "down_blocks.1.attentions.1.transformer_blocks.1.norm2.weight", "down_blocks.1.attentions.1.transformer_blocks.1.norm3.bias", "down_blocks.1.attentions.1.transformer_blocks.1.norm3.weight", "down_blocks.2.attentions.0.transformer_blocks.1.attn1.to_k.weight", "down_blocks.2.attentions.0.transformer_blocks.1.attn1.to_out.0.bias", "down_blocks.2.attentions.0.transformer_blocks.1.attn1.to_out.0.weight", "down_blocks.2.attentions.0.transformer_blocks.1.attn1.to_q.weight", "down_blocks.2.attentions.0.transformer_blocks.1.attn1.to_v.weight", "down_blocks.2.attentions.0.transformer_blocks.1.attn2.to_k.weight", "down_blocks.2.attentions.0.transformer_blocks.1.attn2.to_out.0.bias", "down_blocks.2.attentions.0.transformer_blocks.1.attn2.to_out.0.weight", "down_blocks.2.attentions.0.transformer_blocks.1.attn2.to_q.weight", "down_blocks.2.attentions.0.transformer_blocks.1.attn2.to_v.weight", "down_blocks.2.attentions.0.transformer_blocks.1.ff.net.0.proj.bias", "down_blocks.2.attentions.0.transformer_blocks.1.ff.net.0.proj.weight", "down_blocks.2.attentions.0.transformer_blocks.1.ff.net.2.bias", "down_blocks.2.attentions.0.transformer_blocks.1.ff.net.2.weight", "down_blocks.2.attentions.0.transformer_blocks.1.norm1.bias", "down_blocks.2.attentions.0.transformer_blocks.1.norm1.weight", "down_blocks.2.attentions.0.transformer_blocks.1.norm2.bias", "down_blocks.2.attentions.0.transformer_blocks.1.norm2.weight", "down_blocks.2.attentions.0.transformer_blocks.1.norm3.bias", "down_blocks.2.attentions.0.transformer_blocks.1.norm3.weight", "down_blocks.2.attentions.0.transformer_blocks.2.attn1.to_k.weight", "down_blocks.2.attentions.0.transformer_blocks.2.attn1.to_out.0.bias", "down_blocks.2.attentions.0.transformer_blocks.2.attn1.to_out.0.weight", "down_blocks.2.attentions.0.transformer_blocks.2.attn1.to_q.weight", "down_blocks.2.attentions.0.transformer_blocks.2.attn1.to_v.weight", "down_blocks.2.attentions.0.transformer_blocks.2.attn2.to_k.weight", "down_blocks.2.attentions.0.transformer_blocks.2.attn2.to_out.0.bias", "down_blocks.2.attentions.0.transformer_blocks.2.attn2.to_out.0.weight", "down_blocks.2.attentions.0.transformer_blocks.2.attn2.to_q.weight", "down_blocks.2.attentions.0.transformer_blocks.2.attn2.to_v.weight", "down_blocks.2.attentions.0.transformer_blocks.2.ff.net.0.proj.bias", "down_blocks.2.attentions.0.transformer_blocks.2.ff.net.0.proj.weight", "down_blocks.2.attentions.0.transformer_blocks.2.ff.net.2.bias", "down_blocks.2.attentions.0.transformer_blocks.2.ff.net.2.weight", "down_blocks.2.attentions.0.transformer_blocks.2.norm1.bias", "down_blocks.2.attentions.0.transformer_blocks.2.norm1.weight", "down_blocks.2.attentions.0.transformer_blocks.2.norm2.bias", "down_blocks.2.attentions.0.transformer_blocks.2.norm2.weight", "down_blocks.2.attentions.0.transformer_blocks.2.norm3.bias", "down_blocks.2.attentions.0.transformer_blocks.2.norm3.weight", "down_blocks.2.attentions.0.transformer_blocks.3.attn1.to_k.weight", "down_blocks.2.attentions.0.transformer_blocks.3.attn1.to_out.0.bias", "down_blocks.2.attentions.0.transformer_blocks.3.attn1.to_out.0.weight", "down_blocks.2.attentions.0.transformer_blocks.3.attn1.to_q.weight", "down_blocks.2.attentions.0.transformer_blocks.3.attn1.to_v.weight", "down_blocks.2.attentions.0.transformer_blocks.3.attn2.to_k.weight", "down_blocks.2.attentions.0.transformer_blocks.3.attn2.to_out.0.bias", "down_blocks.2.attentions.0.transformer_blocks.3.attn2.to_out.0.weight", "down_blocks.2.attentions.0.transformer_blocks.3.attn2.to_q.weight", "down_blocks.2.attentions.0.transformer_blocks.3.attn2.to_v.weight", "down_blocks.2.attentions.0.transformer_blocks.3.ff.net.0.proj.bias", "down_blocks.2.attentions.0.transformer_blocks.3.ff.net.0.proj.weight", "down_blocks.2.attentions.0.transformer_blocks.3.ff.net.2.bias", "down_blocks.2.attentions.0.transformer_blocks.3.ff.net.2.weight", "down_blocks.2.attentions.0.transformer_blocks.3.norm1.bias", "down_blocks.2.attentions.0.transformer_blocks.3.norm1.weight", "down_blocks.2.attentions.0.transformer_blocks.3.norm2.bias", "down_blocks.2.attentions.0.transformer_blocks.3.norm2.weight", "down_blocks.2.attentions.0.transformer_blocks.3.norm3.bias", "down_blocks.2.attentions.0.transformer_blocks.3.norm3.weight", "down_blocks.2.attentions.0.transformer_blocks.4.attn1.to_k.weight", "down_blocks.2.attentions.0.transformer_blocks.4.attn1.to_out.0.bias", "down_blocks.2.attentions.0.transformer_blocks.4.attn1.to_out.0.weight", "down_blocks.2.attentions.0.transformer_blocks.4.attn1.to_q.weight", "down_blocks.2.attentions.0.transformer_blocks.4.attn1.to_v.weight", "down_blocks.2.attentions.0.transformer_blocks.4.attn2.to_k.weight", "down_blocks.2.attentions.0.transformer_blocks.4.attn2.to_out.0.bias", "down_blocks.2.attentions.0.transformer_blocks.4.attn2.to_out.0.weight", "down_blocks.2.attentions.0.transformer_blocks.4.attn2.to_q.weight", "down_blocks.2.attentions.0.transformer_blocks.4.attn2.to_v.weight", "down_blocks.2.attentions.0.transformer_blocks.4.ff.net.0.proj.bias", "down_blocks.2.attentions.0.transformer_blocks.4.ff.net.0.proj.weight", "down_blocks.2.attentions.0.transformer_blocks.4.ff.net.2.bias", "down_blocks.2.attentions.0.transformer_blocks.4.ff.net.2.weight", "down_blocks.2.attentions.0.transformer_blocks.4.norm1.bias", "down_blocks.2.attentions.0.transformer_blocks.4.norm1.weight", "down_blocks.2.attentions.0.transformer_blocks.4.norm2.bias", "down_blocks.2.attentions.0.transformer_blocks.4.norm2.weight", "down_blocks.2.attentions.0.transformer_blocks.4.norm3.bias", "down_blocks.2.attentions.0.transformer_blocks.4.norm3.weight", "down_blocks.2.attentions.0.transformer_blocks.5.attn1.to_k.weight", "down_blocks.2.attentions.0.transformer_blocks.5.attn1.to_out.0.bias", "down_blocks.2.attentions.0.transformer_blocks.5.attn1.to_out.0.weight", "down_blocks.2.attentions.0.transformer_blocks.5.attn1.to_q.weight", "down_blocks.2.attentions.0.transformer_blocks.5.attn1.to_v.weight", "down_blocks.2.attentions.0.transformer_blocks.5.attn2.to_k.weight", "down_blocks.2.attentions.0.transformer_blocks.5.attn2.to_out.0.bias", "down_blocks.2.attentions.0.transformer_blocks.5.attn2.to_out.0.weight", "down_blocks.2.attentions.0.transformer_blocks.5.attn2.to_q.weight", "down_blocks.2.attentions.0.transformer_blocks.5.attn2.to_v.weight", "down_blocks.2.attentions.0.transformer_blocks.5.ff.net.0.proj.bias", "down_blocks.2.attentions.0.transformer_blocks.5.ff.net.0.proj.weight", "down_blocks.2.attentions.0.transformer_blocks.5.ff.net.2.bias", "down_blocks.2.attentions.0.transformer_blocks.5.ff.net.2.weight", "down_blocks.2.attentions.0.transformer_blocks.5.norm1.bias", "down_blocks.2.attentions.0.transformer_blocks.5.norm1.weight", "down_blocks.2.attentions.0.transformer_blocks.5.norm2.bias", "down_blocks.2.attentions.0.transformer_blocks.5.norm2.weight", "down_blocks.2.attentions.0.transformer_blocks.5.norm3.bias", "down_blocks.2.attentions.0.transformer_blocks.5.norm3.weight", "down_blocks.2.attentions.0.transformer_blocks.6.attn1.to_k.weight", "down_blocks.2.attentions.0.transformer_blocks.6.attn1.to_out.0.bias", "down_blocks.2.attentions.0.transformer_blocks.6.attn1.to_out.0.weight", "down_blocks.2.attentions.0.transformer_blocks.6.attn1.to_q.weight", "down_blocks.2.attentions.0.transformer_blocks.6.attn1.to_v.weight", "down_blocks.2.attentions.0.transformer_blocks.6.attn2.to_k.weight", "down_blocks.2.attentions.0.transformer_blocks.6.attn2.to_out.0.bias", "down_blocks.2.attentions.0.transformer_blocks.6.attn2.to_out.0.weight", "down_blocks.2.attentions.0.transformer_blocks.6.attn2.to_q.weight", "down_blocks.2.attentions.0.transformer_blocks.6.attn2.to_v.weight", "down_blocks.2.attentions.0.transformer_blocks.6.ff.net.0.proj.bias", "down_blocks.2.attentions.0.transformer_blocks.6.ff.net.0.proj.weight", "down_blocks.2.attentions.0.transformer_blocks.6.ff.net.2.bias", "down_blocks.2.attentions.0.transformer_blocks.6.ff.net.2.weight", "down_blocks.2.attentions.0.transformer_blocks.6.norm1.bias", "down_blocks.2.attentions.0.transformer_blocks.6.norm1.weight", "down_blocks.2.attentions.0.transformer_blocks.6.norm2.bias", "down_blocks.2.attentions.0.transformer_blocks.6.norm2.weight", "down_blocks.2.attentions.0.transformer_blocks.6.norm3.bias", "down_blocks.2.attentions.0.transformer_blocks.6.norm3.weight", "down_blocks.2.attentions.0.transformer_blocks.7.attn1.to_k.weight", "down_blocks.2.attentions.0.transformer_blocks.7.attn1.to_out.0.bias", "down_blocks.2.attentions.0.transformer_blocks.7.attn1.to_out.0.weight", "down_blocks.2.attentions.0.transformer_blocks.7.attn1.to_q.weight", "down_blocks.2.attentions.0.transformer_blocks.7.attn1.to_v.weight", "down_blocks.2.attentions.0.transformer_blocks.7.attn2.to_k.weight", "down_blocks.2.attentions.0.transformer_blocks.7.attn2.to_out.0.bias", "down_blocks.2.attentions.0.transformer_blocks.7.attn2.to_out.0.weight", "down_blocks.2.attentions.0.transformer_blocks.7.attn2.to_q.weight", "down_blocks.2.attentions.0.transformer_blocks.7.attn2.to_v.weight", "down_blocks.2.attentions.0.transformer_blocks.7.ff.net.0.proj.bias", "down_blocks.2.attentions.0.transformer_blocks.7.ff.net.0.proj.weight", "down_blocks.2.attentions.0.transformer_blocks.7.ff.net.2.bias", "down_blocks.2.attentions.0.transformer_blocks.7.ff.net.2.weight", "down_blocks.2.attentions.0.transformer_blocks.7.norm1.bias", "down_blocks.2.attentions.0.transformer_blocks.7.norm1.weight", "down_blocks.2.attentions.0.transformer_blocks.7.norm2.bias", "down_blocks.2.attentions.0.transformer_blocks.7.norm2.weight", "down_blocks.2.attentions.0.transformer_blocks.7.norm3.bias", "down_blocks.2.attentions.0.transformer_blocks.7.norm3.weight", "down_blocks.2.attentions.0.transformer_blocks.8.attn1.to_k.weight", "down_blocks.2.attentions.0.transformer_blocks.8.attn1.to_out.0.bias", "down_blocks.2.attentions.0.transformer_blocks.8.attn1.to_out.0.weight", "down_blocks.2.attentions.0.transformer_blocks.8.attn1.to_q.weight", "down_blocks.2.attentions.0.transformer_blocks.8.attn1.to_v.weight", "down_blocks.2.attentions.0.transformer_blocks.8.attn2.to_k.weight", "down_blocks.2.attentions.0.transformer_blocks.8.attn2.to_out.0.bias", "down_blocks.2.attentions.0.transformer_blocks.8.attn2.to_out.0.weight", "down_blocks.2.attentions.0.transformer_blocks.8.attn2.to_q.weight", "down_blocks.2.attentions.0.transformer_blocks.8.attn2.to_v.weight", "down_blocks.2.attentions.0.transformer_blocks.8.ff.net.0.proj.bias", "down_blocks.2.attentions.0.transformer_blocks.8.ff.net.0.proj.weight", "down_blocks.2.attentions.0.transformer_blocks.8.ff.net.2.bias", "down_blocks.2.attentions.0.transformer_blocks.8.ff.net.2.weight", "down_blocks.2.attentions.0.transformer_blocks.8.norm1.bias", "down_blocks.2.attentions.0.transformer_blocks.8.norm1.weight", "down_blocks.2.attentions.0.transformer_blocks.8.norm2.bias", "down_blocks.2.attentions.0.transformer_blocks.8.norm2.weight", "down_blocks.2.attentions.0.transformer_blocks.8.norm3.bias", "down_blocks.2.attentions.0.transformer_blocks.8.norm3.weight", "down_blocks.2.attentions.0.transformer_blocks.9.attn1.to_k.weight", "down_blocks.2.attentions.0.transformer_blocks.9.attn1.to_out.0.bias", "down_blocks.2.attentions.0.transformer_blocks.9.attn1.to_out.0.weight", "down_blocks.2.attentions.0.transformer_blocks.9.attn1.to_q.weight", "down_blocks.2.attentions.0.transformer_blocks.9.attn1.to_v.weight", "down_blocks.2.attentions.0.transformer_blocks.9.attn2.to_k.weight", "down_blocks.2.attentions.0.transformer_blocks.9.attn2.to_out.0.bias", "down_blocks.2.attentions.0.transformer_blocks.9.attn2.to_out.0.weight", "down_blocks.2.attentions.0.transformer_blocks.9.attn2.to_q.weight", "down_blocks.2.attentions.0.transformer_blocks.9.attn2.to_v.weight", "down_blocks.2.attentions.0.transformer_blocks.9.ff.net.0.proj.bias", "down_blocks.2.attentions.0.transformer_blocks.9.ff.net.0.proj.weight", "down_blocks.2.attentions.0.transformer_blocks.9.ff.net.2.bias", "down_blocks.2.attentions.0.transformer_blocks.9.ff.net.2.weight", "down_blocks.2.attentions.0.transformer_blocks.9.norm1.bias", "down_blocks.2.attentions.0.transformer_blocks.9.norm1.weight", "down_blocks.2.attentions.0.transformer_blocks.9.norm2.bias", "down_blocks.2.attentions.0.transformer_blocks.9.norm2.weight", "down_blocks.2.attentions.0.transformer_blocks.9.norm3.bias", "down_blocks.2.attentions.0.transformer_blocks.9.norm3.weight", "down_blocks.2.attentions.1.transformer_blocks.1.attn1.to_k.weight", "down_blocks.2.attentions.1.transformer_blocks.1.attn1.to_out.0.bias", "down_blocks.2.attentions.1.transformer_blocks.1.attn1.to_out.0.weight", "down_blocks.2.attentions.1.transformer_blocks.1.attn1.to_q.weight", "down_blocks.2.attentions.1.transformer_blocks.1.attn1.to_v.weight", "down_blocks.2.attentions.1.transformer_blocks.1.attn2.to_k.weight", "down_blocks.2.attentions.1.transformer_blocks.1.attn2.to_out.0.bias", "down_blocks.2.attentions.1.transformer_blocks.1.attn2.to_out.0.weight", "down_blocks.2.attentions.1.transformer_blocks.1.attn2.to_q.weight", "down_blocks.2.attentions.1.transformer_blocks.1.attn2.to_v.weight", "down_blocks.2.attentions.1.transformer_blocks.1.ff.net.0.proj.bias", "down_blocks.2.attentions.1.transformer_blocks.1.ff.net.0.proj.weight", "down_blocks.2.attentions.1.transformer_blocks.1.ff.net.2.bias", "down_blocks.2.attentions.1.transformer_blocks.1.ff.net.2.weight", "down_blocks.2.attentions.1.transformer_blocks.1.norm1.bias", "down_blocks.2.attentions.1.transformer_blocks.1.norm1.weight", "down_blocks.2.attentions.1.transformer_blocks.1.norm2.bias", "down_blocks.2.attentions.1.transformer_blocks.1.norm2.weight", "down_blocks.2.attentions.1.transformer_blocks.1.norm3.bias", "down_blocks.2.attentions.1.transformer_blocks.1.norm3.weight", "down_blocks.2.attentions.1.transformer_blocks.2.attn1.to_k.weight", "down_blocks.2.attentions.1.transformer_blocks.2.attn1.to_out.0.bias", "down_blocks.2.attentions.1.transformer_blocks.2.attn1.to_out.0.weight", "down_blocks.2.attentions.1.transformer_blocks.2.attn1.to_q.weight", "down_blocks.2.attentions.1.transformer_blocks.2.attn1.to_v.weight", "down_blocks.2.attentions.1.transformer_blocks.2.attn2.to_k.weight", "down_blocks.2.attentions.1.transformer_blocks.2.attn2.to_out.0.bias", "down_blocks.2.attentions.1.transformer_blocks.2.attn2.to_out.0.weight", "down_blocks.2.attentions.1.transformer_blocks.2.attn2.to_q.weight", "down_blocks.2.attentions.1.transformer_blocks.2.attn2.to_v.weight", "down_blocks.2.attentions.1.transformer_blocks.2.ff.net.0.proj.bias", "down_blocks.2.attentions.1.transformer_blocks.2.ff.net.0.proj.weight", "down_blocks.2.attentions.1.transformer_blocks.2.ff.net.2.bias", "down_blocks.2.attentions.1.transformer_blocks.2.ff.net.2.weight", "down_blocks.2.attentions.1.transformer_blocks.2.norm1.bias", "down_blocks.2.attentions.1.transformer_blocks.2.norm1.weight", "down_blocks.2.attentions.1.transformer_blocks.2.norm2.bias", "down_blocks.2.attentions.1.transformer_blocks.2.norm2.weight", "down_blocks.2.attentions.1.transformer_blocks.2.norm3.bias", "down_blocks.2.attentions.1.transformer_blocks.2.norm3.weight", "down_blocks.2.attentions.1.transformer_blocks.3.attn1.to_k.weight", "down_blocks.2.attentions.1.transformer_blocks.3.attn1.to_out.0.bias", "down_blocks.2.attentions.1.transformer_blocks.3.attn1.to_out.0.weight", "down_blocks.2.attentions.1.transformer_blocks.3.attn1.to_q.weight", "down_blocks.2.attentions.1.transformer_blocks.3.attn1.to_v.weight", "down_blocks.2.attentions.1.transformer_blocks.3.attn2.to_k.weight", "down_blocks.2.attentions.1.transformer_blocks.3.attn2.to_out.0.bias", "down_blocks.2.attentions.1.transformer_blocks.3.attn2.to_out.0.weight", "down_blocks.2.attentions.1.transformer_blocks.3.attn2.to_q.weight", "down_blocks.2.attentions.1.transformer_blocks.3.attn2.to_v.weight", "down_blocks.2.attentions.1.transformer_blocks.3.ff.net.0.proj.bias", "down_blocks.2.attentions.1.transformer_blocks.3.ff.net.0.proj.weight", "down_blocks.2.attentions.1.transformer_blocks.3.ff.net.2.bias", "down_blocks.2.attentions.1.transformer_blocks.3.ff.net.2.weight", "down_blocks.2.attentions.1.transformer_blocks.3.norm1.bias", "down_blocks.2.attentions.1.transformer_blocks.3.norm1.weight", "down_blocks.2.attentions.1.transformer_blocks.3.norm2.bias", "down_blocks.2.attentions.1.transformer_blocks.3.norm2.weight", "down_blocks.2.attentions.1.transformer_blocks.3.norm3.bias", "down_blocks.2.attentions.1.transformer_blocks.3.norm3.weight", "down_blocks.2.attentions.1.transformer_blocks.4.attn1.to_k.weight", "down_blocks.2.attentions.1.transformer_blocks.4.attn1.to_out.0.bias", "down_blocks.2.attentions.1.transformer_blocks.4.attn1.to_out.0.weight", "down_blocks.2.attentions.1.transformer_blocks.4.attn1.to_q.weight", "down_blocks.2.attentions.1.transformer_blocks.4.attn1.to_v.weight", "down_blocks.2.attentions.1.transformer_blocks.4.attn2.to_k.weight", "down_blocks.2.attentions.1.transformer_blocks.4.attn2.to_out.0.bias", "down_blocks.2.attentions.1.transformer_blocks.4.attn2.to_out.0.weight", "down_blocks.2.attentions.1.transformer_blocks.4.attn2.to_q.weight", "down_blocks.2.attentions.1.transformer_blocks.4.attn2.to_v.weight", "down_blocks.2.attentions.1.transformer_blocks.4.ff.net.0.proj.bias", "down_blocks.2.attentions.1.transformer_blocks.4.ff.net.0.proj.weight", "down_blocks.2.attentions.1.transformer_blocks.4.ff.net.2.bias", "down_blocks.2.attentions.1.transformer_blocks.4.ff.net.2.weight", "down_blocks.2.attentions.1.transformer_blocks.4.norm1.bias", "down_blocks.2.attentions.1.transformer_blocks.4.norm1.weight", "down_blocks.2.attentions.1.transformer_blocks.4.norm2.bias", "down_blocks.2.attentions.1.transformer_blocks.4.norm2.weight", "down_blocks.2.attentions.1.transformer_blocks.4.norm3.bias", "down_blocks.2.attentions.1.transformer_blocks.4.norm3.weight", "down_blocks.2.attentions.1.transformer_blocks.5.attn1.to_k.weight", "down_blocks.2.attentions.1.transformer_blocks.5.attn1.to_out.0.bias", "down_blocks.2.attentions.1.transformer_blocks.5.attn1.to_out.0.weight", "down_blocks.2.attentions.1.transformer_blocks.5.attn1.to_q.weight", "down_blocks.2.attentions.1.transformer_blocks.5.attn1.to_v.weight", "down_blocks.2.attentions.1.transformer_blocks.5.attn2.to_k.weight", "down_blocks.2.attentions.1.transformer_blocks.5.attn2.to_out.0.bias", "down_blocks.2.attentions.1.transformer_blocks.5.attn2.to_out.0.weight", "down_blocks.2.attentions.1.transformer_blocks.5.attn2.to_q.weight", "down_blocks.2.attentions.1.transformer_blocks.5.attn2.to_v.weight", "down_blocks.2.attentions.1.transformer_blocks.5.ff.net.0.proj.bias", "down_blocks.2.attentions.1.transformer_blocks.5.ff.net.0.proj.weight", "down_blocks.2.attentions.1.transformer_blocks.5.ff.net.2.bias", "down_blocks.2.attentions.1.transformer_blocks.5.ff.net.2.weight", "down_blocks.2.attentions.1.transformer_blocks.5.norm1.bias", "down_blocks.2.attentions.1.transformer_blocks.5.norm1.weight", "down_blocks.2.attentions.1.transformer_blocks.5.norm2.bias", "down_blocks.2.attentions.1.transformer_blocks.5.norm2.weight", "down_blocks.2.attentions.1.transformer_blocks.5.norm3.bias", "down_blocks.2.attentions.1.transformer_blocks.5.norm3.weight", "down_blocks.2.attentions.1.transformer_blocks.6.attn1.to_k.weight", "down_blocks.2.attentions.1.transformer_blocks.6.attn1.to_out.0.bias", "down_blocks.2.attentions.1.transformer_blocks.6.attn1.to_out.0.weight", "down_blocks.2.attentions.1.transformer_blocks.6.attn1.to_q.weight", "down_blocks.2.attentions.1.transformer_blocks.6.attn1.to_v.weight", "down_blocks.2.attentions.1.transformer_blocks.6.attn2.to_k.weight", "down_blocks.2.attentions.1.transformer_blocks.6.attn2.to_out.0.bias", "down_blocks.2.attentions.1.transformer_blocks.6.attn2.to_out.0.weight", "down_blocks.2.attentions.1.transformer_blocks.6.attn2.to_q.weight", "down_blocks.2.attentions.1.transformer_blocks.6.attn2.to_v.weight", "down_blocks.2.attentions.1.transformer_blocks.6.ff.net.0.proj.bias", "down_blocks.2.attentions.1.transformer_blocks.6.ff.net.0.proj.weight", "down_blocks.2.attentions.1.transformer_blocks.6.ff.net.2.bias", "down_blocks.2.attentions.1.transformer_blocks.6.ff.net.2.weight", "down_blocks.2.attentions.1.transformer_blocks.6.norm1.bias", "down_blocks.2.attentions.1.transformer_blocks.6.norm1.weight", "down_blocks.2.attentions.1.transformer_blocks.6.norm2.bias", "down_blocks.2.attentions.1.transformer_blocks.6.norm2.weight", "down_blocks.2.attentions.1.transformer_blocks.6.norm3.bias", "down_blocks.2.attentions.1.transformer_blocks.6.norm3.weight", "down_blocks.2.attentions.1.transformer_blocks.7.attn1.to_k.weight", "down_blocks.2.attentions.1.transformer_blocks.7.attn1.to_out.0.bias", "down_blocks.2.attentions.1.transformer_blocks.7.attn1.to_out.0.weight", "down_blocks.2.attentions.1.transformer_blocks.7.attn1.to_q.weight", "down_blocks.2.attentions.1.transformer_blocks.7.attn1.to_v.weight", "down_blocks.2.attentions.1.transformer_blocks.7.attn2.to_k.weight", "down_blocks.2.attentions.1.transformer_blocks.7.attn2.to_out.0.bias", "down_blocks.2.attentions.1.transformer_blocks.7.attn2.to_out.0.weight", "down_blocks.2.attentions.1.transformer_blocks.7.attn2.to_q.weight", "down_blocks.2.attentions.1.transformer_blocks.7.attn2.to_v.weight", "down_blocks.2.attentions.1.transformer_blocks.7.ff.net.0.proj.bias", "down_blocks.2.attentions.1.transformer_blocks.7.ff.net.0.proj.weight", "down_blocks.2.attentions.1.transformer_blocks.7.ff.net.2.bias", "down_blocks.2.attentions.1.transformer_blocks.7.ff.net.2.weight", "down_blocks.2.attentions.1.transformer_blocks.7.norm1.bias", "down_blocks.2.attentions.1.transformer_blocks.7.norm1.weight", "down_blocks.2.attentions.1.transformer_blocks.7.norm2.bias", "down_blocks.2.attentions.1.transformer_blocks.7.norm2.weight", "down_blocks.2.attentions.1.transformer_blocks.7.norm3.bias", "down_blocks.2.attentions.1.transformer_blocks.7.norm3.weight", "down_blocks.2.attentions.1.transformer_blocks.8.attn1.to_k.weight", "down_blocks.2.attentions.1.transformer_blocks.8.attn1.to_out.0.bias", "down_blocks.2.attentions.1.transformer_blocks.8.attn1.to_out.0.weight", "down_blocks.2.attentions.1.transformer_blocks.8.attn1.to_q.weight", "down_blocks.2.attentions.1.transformer_blocks.8.attn1.to_v.weight", "down_blocks.2.attentions.1.transformer_blocks.8.attn2.to_k.weight", "down_blocks.2.attentions.1.transformer_blocks.8.attn2.to_out.0.bias", "down_blocks.2.attentions.1.transformer_blocks.8.attn2.to_out.0.weight", "down_blocks.2.attentions.1.transformer_blocks.8.attn2.to_q.weight", "down_blocks.2.attentions.1.transformer_blocks.8.attn2.to_v.weight", "down_blocks.2.attentions.1.transformer_blocks.8.ff.net.0.proj.bias", "down_blocks.2.attentions.1.transformer_blocks.8.ff.net.0.proj.weight", "down_blocks.2.attentions.1.transformer_blocks.8.ff.net.2.bias", "down_blocks.2.attentions.1.transformer_blocks.8.ff.net.2.weight", "down_blocks.2.attentions.1.transformer_blocks.8.norm1.bias", "down_blocks.2.attentions.1.transformer_blocks.8.norm1.weight", "down_blocks.2.attentions.1.transformer_blocks.8.norm2.bias", "down_blocks.2.attentions.1.transformer_blocks.8.norm2.weight", "down_blocks.2.attentions.1.transformer_blocks.8.norm3.bias", "down_blocks.2.attentions.1.transformer_blocks.8.norm3.weight", "down_blocks.2.attentions.1.transformer_blocks.9.attn1.to_k.weight", "down_blocks.2.attentions.1.transformer_blocks.9.attn1.to_out.0.bias", "down_blocks.2.attentions.1.transformer_blocks.9.attn1.to_out.0.weight", "down_blocks.2.attentions.1.transformer_blocks.9.attn1.to_q.weight", "down_blocks.2.attentions.1.transformer_blocks.9.attn1.to_v.weight", "down_blocks.2.attentions.1.transformer_blocks.9.attn2.to_k.weight", "down_blocks.2.attentions.1.transformer_blocks.9.attn2.to_out.0.bias", "down_blocks.2.attentions.1.transformer_blocks.9.attn2.to_out.0.weight", "down_blocks.2.attentions.1.transformer_blocks.9.attn2.to_q.weight", "down_blocks.2.attentions.1.transformer_blocks.9.attn2.to_v.weight", "down_blocks.2.attentions.1.transformer_blocks.9.ff.net.0.proj.bias", "down_blocks.2.attentions.1.transformer_blocks.9.ff.net.0.proj.weight", "down_blocks.2.attentions.1.transformer_blocks.9.ff.net.2.bias", "down_blocks.2.attentions.1.transformer_blocks.9.ff.net.2.weight", "down_blocks.2.attentions.1.transformer_blocks.9.norm1.bias", "down_blocks.2.attentions.1.transformer_blocks.9.norm1.weight", "down_blocks.2.attentions.1.transformer_blocks.9.norm2.bias", "down_blocks.2.attentions.1.transformer_blocks.9.norm2.weight", "down_blocks.2.attentions.1.transformer_blocks.9.norm3.bias", "down_blocks.2.attentions.1.transformer_blocks.9.norm3.weight", "up_blocks.0.attentions.0.norm.bias", "up_blocks.0.attentions.0.norm.weight", "up_blocks.0.attentions.0.proj_in.bias", "up_blocks.0.attentions.0.proj_in.weight", "up_blocks.0.attentions.0.proj_out.bias", "up_blocks.0.attentions.0.proj_out.weight", "up_blocks.0.attentions.0.transformer_blocks.0.attn1.to_k.weight", "up_blocks.0.attentions.0.transformer_blocks.0.attn1.to_out.0.bias", "up_blocks.0.attentions.0.transformer_blocks.0.attn1.to_out.0.weight", "up_blocks.0.attentions.0.transformer_blocks.0.attn1.to_q.weight", "up_blocks.0.attentions.0.transformer_blocks.0.attn1.to_v.weight", "up_blocks.0.attentions.0.transformer_blocks.0.attn2.to_k.weight", "up_blocks.0.attentions.0.transformer_blocks.0.attn2.to_out.0.bias", "up_blocks.0.attentions.0.transformer_blocks.0.attn2.to_out.0.weight", "up_blocks.0.attentions.0.transformer_blocks.0.attn2.to_q.weight", "up_blocks.0.attentions.0.transformer_blocks.0.attn2.to_v.weight", "up_blocks.0.attentions.0.transformer_blocks.0.ff.net.0.proj.bias", "up_blocks.0.attentions.0.transformer_blocks.0.ff.net.0.proj.weight", "up_blocks.0.attentions.0.transformer_blocks.0.ff.net.2.bias", "up_blocks.0.attentions.0.transformer_blocks.0.ff.net.2.weight", "up_blocks.0.attentions.0.transformer_blocks.0.norm1.bias", "up_blocks.0.attentions.0.transformer_blocks.0.norm1.weight", "up_blocks.0.attentions.0.transformer_blocks.0.norm2.bias", "up_blocks.0.attentions.0.transformer_blocks.0.norm2.weight", "up_blocks.0.attentions.0.transformer_blocks.0.norm3.bias", "up_blocks.0.attentions.0.transformer_blocks.0.norm3.weight", "up_blocks.0.attentions.0.transformer_blocks.1.attn1.to_k.weight", "up_blocks.0.attentions.0.transformer_blocks.1.attn1.to_out.0.bias", "up_blocks.0.attentions.0.transformer_blocks.1.attn1.to_out.0.weight", "up_blocks.0.attentions.0.transformer_blocks.1.attn1.to_q.weight", "up_blocks.0.attentions.0.transformer_blocks.1.attn1.to_v.weight", "up_blocks.0.attentions.0.transformer_blocks.1.attn2.to_k.weight", "up_blocks.0.attentions.0.transformer_blocks.1.attn2.to_out.0.bias", "up_blocks.0.attentions.0.transformer_blocks.1.attn2.to_out.0.weight", "up_blocks.0.attentions.0.transformer_blocks.1.attn2.to_q.weight", "up_blocks.0.attentions.0.transformer_blocks.1.attn2.to_v.weight", "up_blocks.0.attentions.0.transformer_blocks.1.ff.net.0.proj.bias", "up_blocks.0.attentions.0.transformer_blocks.1.ff.net.0.proj.weight", "up_blocks.0.attentions.0.transformer_blocks.1.ff.net.2.bias", "up_blocks.0.attentions.0.transformer_blocks.1.ff.net.2.weight", "up_blocks.0.attentions.0.transformer_blocks.1.norm1.bias", "up_blocks.0.attentions.0.transformer_blocks.1.norm1.weight", "up_blocks.0.attentions.0.transformer_blocks.1.norm2.bias", "up_blocks.0.attentions.0.transformer_blocks.1.norm2.weight", "up_blocks.0.attentions.0.transformer_blocks.1.norm3.bias", "up_blocks.0.attentions.0.transformer_blocks.1.norm3.weight", "up_blocks.0.attentions.0.transformer_blocks.2.attn1.to_k.weight", "up_blocks.0.attentions.0.transformer_blocks.2.attn1.to_out.0.bias", "up_blocks.0.attentions.0.transformer_blocks.2.attn1.to_out.0.weight", "up_blocks.0.attentions.0.transformer_blocks.2.attn1.to_q.weight", "up_blocks.0.attentions.0.transformer_blocks.2.attn1.to_v.weight", "up_blocks.0.attentions.0.transformer_blocks.2.attn2.to_k.weight", "up_blocks.0.attentions.0.transformer_blocks.2.attn2.to_out.0.bias", "up_blocks.0.attentions.0.transformer_blocks.2.attn2.to_out.0.weight", "up_blocks.0.attentions.0.transformer_blocks.2.attn2.to_q.weight", "up_blocks.0.attentions.0.transformer_blocks.2.attn2.to_v.weight", "up_blocks.0.attentions.0.transformer_blocks.2.ff.net.0.proj.bias", "up_blocks.0.attentions.0.transformer_blocks.2.ff.net.0.proj.weight", "up_blocks.0.attentions.0.transformer_blocks.2.ff.net.2.bias", "up_blocks.0.attentions.0.transformer_blocks.2.ff.net.2.weight", "up_blocks.0.attentions.0.transformer_blocks.2.norm1.bias", "up_blocks.0.attentions.0.transformer_blocks.2.norm1.weight", "up_blocks.0.attentions.0.transformer_blocks.2.norm2.bias", "up_blocks.0.attentions.0.transformer_blocks.2.norm2.weight", "up_blocks.0.attentions.0.transformer_blocks.2.norm3.bias", "up_blocks.0.attentions.0.transformer_blocks.2.norm3.weight", "up_blocks.0.attentions.0.transformer_blocks.3.attn1.to_k.weight", "up_blocks.0.attentions.0.transformer_blocks.3.attn1.to_out.0.bias", "up_blocks.0.attentions.0.transformer_blocks.3.attn1.to_out.0.weight", "up_blocks.0.attentions.0.transformer_blocks.3.attn1.to_q.weight", "up_blocks.0.attentions.0.transformer_blocks.3.attn1.to_v.weight", "up_blocks.0.attentions.0.transformer_blocks.3.attn2.to_k.weight", "up_blocks.0.attentions.0.transformer_blocks.3.attn2.to_out.0.bias", "up_blocks.0.attentions.0.transformer_blocks.3.attn2.to_out.0.weight", "up_blocks.0.attentions.0.transformer_blocks.3.attn2.to_q.weight", "up_blocks.0.attentions.0.transformer_blocks.3.attn2.to_v.weight", "up_blocks.0.attentions.0.transformer_blocks.3.ff.net.0.proj.bias", "up_blocks.0.attentions.0.transformer_blocks.3.ff.net.0.proj.weight", "up_blocks.0.attentions.0.transformer_blocks.3.ff.net.2.bias", "up_blocks.0.attentions.0.transformer_blocks.3.ff.net.2.weight", "up_blocks.0.attentions.0.transformer_blocks.3.norm1.bias", "up_blocks.0.attentions.0.transformer_blocks.3.norm1.weight", "up_blocks.0.attentions.0.transformer_blocks.3.norm2.bias", "up_blocks.0.attentions.0.transformer_blocks.3.norm2.weight", "up_blocks.0.attentions.0.transformer_blocks.3.norm3.bias", "up_blocks.0.attentions.0.transformer_blocks.3.norm3.weight", "up_blocks.0.attentions.0.transformer_blocks.4.attn1.to_k.weight", "up_blocks.0.attentions.0.transformer_blocks.4.attn1.to_out.0.bias", "up_blocks.0.attentions.0.transformer_blocks.4.attn1.to_out.0.weight", "up_blocks.0.attentions.0.transformer_blocks.4.attn1.to_q.weight", "up_blocks.0.attentions.0.transformer_blocks.4.attn1.to_v.weight", "up_blocks.0.attentions.0.transformer_blocks.4.attn2.to_k.weight", "up_blocks.0.attentions.0.transformer_blocks.4.attn2.to_out.0.bias", "up_blocks.0.attentions.0.transformer_blocks.4.attn2.to_out.0.weight", "up_blocks.0.attentions.0.transformer_blocks.4.attn2.to_q.weight", "up_blocks.0.attentions.0.transformer_blocks.4.attn2.to_v.weight", "up_blocks.0.attentions.0.transformer_blocks.4.ff.net.0.proj.bias", "up_blocks.0.attentions.0.transformer_blocks.4.ff.net.0.proj.weight", "up_blocks.0.attentions.0.transformer_blocks.4.ff.net.2.bias", "up_blocks.0.attentions.0.transformer_blocks.4.ff.net.2.weight", "up_blocks.0.attentions.0.transformer_blocks.4.norm1.bias", "up_blocks.0.attentions.0.transformer_blocks.4.norm1.weight", "up_blocks.0.attentions.0.transformer_blocks.4.norm2.bias", "up_blocks.0.attentions.0.transformer_blocks.4.norm2.weight", "up_blocks.0.attentions.0.transformer_blocks.4.norm3.bias", "up_blocks.0.attentions.0.transformer_blocks.4.norm3.weight", "up_blocks.0.attentions.0.transformer_blocks.5.attn1.to_k.weight", "up_blocks.0.attentions.0.transformer_blocks.5.attn1.to_out.0.bias", "up_blocks.0.attentions.0.transformer_blocks.5.attn1.to_out.0.weight", "up_blocks.0.attentions.0.transformer_blocks.5.attn1.to_q.weight", "up_blocks.0.attentions.0.transformer_blocks.5.attn1.to_v.weight", "up_blocks.0.attentions.0.transformer_blocks.5.attn2.to_k.weight", "up_blocks.0.attentions.0.transformer_blocks.5.attn2.to_out.0.bias", "up_blocks.0.attentions.0.transformer_blocks.5.attn2.to_out.0.weight", "up_blocks.0.attentions.0.transformer_blocks.5.attn2.to_q.weight", "up_blocks.0.attentions.0.transformer_blocks.5.attn2.to_v.weight", "up_blocks.0.attentions.0.transformer_blocks.5.ff.net.0.proj.bias", "up_blocks.0.attentions.0.transformer_blocks.5.ff.net.0.proj.weight", "up_blocks.0.attentions.0.transformer_blocks.5.ff.net.2.bias", "up_blocks.0.attentions.0.transformer_blocks.5.ff.net.2.weight", "up_blocks.0.attentions.0.transformer_blocks.5.norm1.bias", "up_blocks.0.attentions.0.transformer_blocks.5.norm1.weight", "up_blocks.0.attentions.0.transformer_blocks.5.norm2.bias", "up_blocks.0.attentions.0.transformer_blocks.5.norm2.weight", "up_blocks.0.attentions.0.transformer_blocks.5.norm3.bias", "up_blocks.0.attentions.0.transformer_blocks.5.norm3.weight", "up_blocks.0.attentions.0.transformer_blocks.6.attn1.to_k.weight", "up_blocks.0.attentions.0.transformer_blocks.6.attn1.to_out.0.bias", "up_blocks.0.attentions.0.transformer_blocks.6.attn1.to_out.0.weight", "up_blocks.0.attentions.0.transformer_blocks.6.attn1.to_q.weight", "up_blocks.0.attentions.0.transformer_blocks.6.attn1.to_v.weight", "up_blocks.0.attentions.0.transformer_blocks.6.attn2.to_k.weight", "up_blocks.0.attentions.0.transformer_blocks.6.attn2.to_out.0.bias", "up_blocks.0.attentions.0.transformer_blocks.6.attn2.to_out.0.weight", "up_blocks.0.attentions.0.transformer_blocks.6.attn2.to_q.weight", "up_blocks.0.attentions.0.transformer_blocks.6.attn2.to_v.weight", "up_blocks.0.attentions.0.transformer_blocks.6.ff.net.0.proj.bias", "up_blocks.0.attentions.0.transformer_blocks.6.ff.net.0.proj.weight", "up_blocks.0.attentions.0.transformer_blocks.6.ff.net.2.bias", "up_blocks.0.attentions.0.transformer_blocks.6.ff.net.2.weight", "up_blocks.0.attentions.0.transformer_blocks.6.norm1.bias", "up_blocks.0.attentions.0.transformer_blocks.6.norm1.weight", "up_blocks.0.attentions.0.transformer_blocks.6.norm2.bias", "up_blocks.0.attentions.0.transformer_blocks.6.norm2.weight", "up_blocks.0.attentions.0.transformer_blocks.6.norm3.bias", "up_blocks.0.attentions.0.transformer_blocks.6.norm3.weight", "up_blocks.0.attentions.0.transformer_blocks.7.attn1.to_k.weight", "up_blocks.0.attentions.0.transformer_blocks.7.attn1.to_out.0.bias", "up_blocks.0.attentions.0.transformer_blocks.7.attn1.to_out.0.weight", "up_blocks.0.attentions.0.transformer_blocks.7.attn1.to_q.weight", "up_blocks.0.attentions.0.transformer_blocks.7.attn1.to_v.weight", "up_blocks.0.attentions.0.transformer_blocks.7.attn2.to_k.weight", "up_blocks.0.attentions.0.transformer_blocks.7.attn2.to_out.0.bias", "up_blocks.0.attentions.0.transformer_blocks.7.attn2.to_out.0.weight", "up_blocks.0.attentions.0.transformer_blocks.7.attn2.to_q.weight", "up_blocks.0.attentions.0.transformer_blocks.7.attn2.to_v.weight", "up_blocks.0.attentions.0.transformer_blocks.7.ff.net.0.proj.bias", "up_blocks.0.attentions.0.transformer_blocks.7.ff.net.0.proj.weight", "up_blocks.0.attentions.0.transformer_blocks.7.ff.net.2.bias", "up_blocks.0.attentions.0.transformer_blocks.7.ff.net.2.weight", "up_blocks.0.attentions.0.transformer_blocks.7.norm1.bias", "up_blocks.0.attentions.0.transformer_blocks.7.norm1.weight", "up_blocks.0.attentions.0.transformer_blocks.7.norm2.bias", "up_blocks.0.attentions.0.transformer_blocks.7.norm2.weight", "up_blocks.0.attentions.0.transformer_blocks.7.norm3.bias", "up_blocks.0.attentions.0.transformer_blocks.7.norm3.weight", "up_blocks.0.attentions.0.transformer_blocks.8.attn1.to_k.weight", "up_blocks.0.attentions.0.transformer_blocks.8.attn1.to_out.0.bias", "up_blocks.0.attentions.0.transformer_blocks.8.attn1.to_out.0.weight", "up_blocks.0.attentions.0.transformer_blocks.8.attn1.to_q.weight", "up_blocks.0.attentions.0.transformer_blocks.8.attn1.to_v.weight", "up_blocks.0.attentions.0.transformer_blocks.8.attn2.to_k.weight", "up_blocks.0.attentions.0.transformer_blocks.8.attn2.to_out.0.bias", "up_blocks.0.attentions.0.transformer_blocks.8.attn2.to_out.0.weight", "up_blocks.0.attentions.0.transformer_blocks.8.attn2.to_q.weight", "up_blocks.0.attentions.0.transformer_blocks.8.attn2.to_v.weight", "up_blocks.0.attentions.0.transformer_blocks.8.ff.net.0.proj.bias", "up_blocks.0.attentions.0.transformer_blocks.8.ff.net.0.proj.weight", "up_blocks.0.attentions.0.transformer_blocks.8.ff.net.2.bias", "up_blocks.0.attentions.0.transformer_blocks.8.ff.net.2.weight", "up_blocks.0.attentions.0.transformer_blocks.8.norm1.bias", "up_blocks.0.attentions.0.transformer_blocks.8.norm1.weight", "up_blocks.0.attentions.0.transformer_blocks.8.norm2.bias", "up_blocks.0.attentions.0.transformer_blocks.8.norm2.weight", "up_blocks.0.attentions.0.transformer_blocks.8.norm3.bias", "up_blocks.0.attentions.0.transformer_blocks.8.norm3.weight", "up_blocks.0.attentions.0.transformer_blocks.9.attn1.to_k.weight", "up_blocks.0.attentions.0.transformer_blocks.9.attn1.to_out.0.bias", "up_blocks.0.attentions.0.transformer_blocks.9.attn1.to_out.0.weight", "up_blocks.0.attentions.0.transformer_blocks.9.attn1.to_q.weight", "up_blocks.0.attentions.0.transformer_blocks.9.attn1.to_v.weight", "up_blocks.0.attentions.0.transformer_blocks.9.attn2.to_k.weight", "up_blocks.0.attentions.0.transformer_blocks.9.attn2.to_out.0.bias", "up_blocks.0.attentions.0.transformer_blocks.9.attn2.to_out.0.weight", "up_blocks.0.attentions.0.transformer_blocks.9.attn2.to_q.weight", "up_blocks.0.attentions.0.transformer_blocks.9.attn2.to_v.weight", "up_blocks.0.attentions.0.transformer_blocks.9.ff.net.0.proj.bias", "up_blocks.0.attentions.0.transformer_blocks.9.ff.net.0.proj.weight", "up_blocks.0.attentions.0.transformer_blocks.9.ff.net.2.bias", "up_blocks.0.attentions.0.transformer_blocks.9.ff.net.2.weight", "up_blocks.0.attentions.0.transformer_blocks.9.norm1.bias", "up_blocks.0.attentions.0.transformer_blocks.9.norm1.weight", "up_blocks.0.attentions.0.transformer_blocks.9.norm2.bias", "up_blocks.0.attentions.0.transformer_blocks.9.norm2.weight", "up_blocks.0.attentions.0.transformer_blocks.9.norm3.bias", "up_blocks.0.attentions.0.transformer_blocks.9.norm3.weight", "up_blocks.0.attentions.1.norm.bias", "up_blocks.0.attentions.1.norm.weight", "up_blocks.0.attentions.1.proj_in.bias", "up_blocks.0.attentions.1.proj_in.weight", "up_blocks.0.attentions.1.proj_out.bias", "up_blocks.0.attentions.1.proj_out.weight", "up_blocks.0.attentions.1.transformer_blocks.0.attn1.to_k.weight", "up_blocks.0.attentions.1.transformer_blocks.0.attn1.to_out.0.bias", "up_blocks.0.attentions.1.transformer_blocks.0.attn1.to_out.0.weight", "up_blocks.0.attentions.1.transformer_blocks.0.attn1.to_q.weight", "up_blocks.0.attentions.1.transformer_blocks.0.attn1.to_v.weight", "up_blocks.0.attentions.1.transformer_blocks.0.attn2.to_k.weight", "up_blocks.0.attentions.1.transformer_blocks.0.attn2.to_out.0.bias", "up_blocks.0.attentions.1.transformer_blocks.0.attn2.to_out.0.weight", "up_blocks.0.attentions.1.transformer_blocks.0.attn2.to_q.weight", "up_blocks.0.attentions.1.transformer_blocks.0.attn2.to_v.weight", "up_blocks.0.attentions.1.transformer_blocks.0.ff.net.0.proj.bias", "up_blocks.0.attentions.1.transformer_blocks.0.ff.net.0.proj.weight", "up_blocks.0.attentions.1.transformer_blocks.0.ff.net.2.bias", "up_blocks.0.attentions.1.transformer_blocks.0.ff.net.2.weight", "up_blocks.0.attentions.1.transformer_blocks.0.norm1.bias", "up_blocks.0.attentions.1.transformer_blocks.0.norm1.weight", "up_blocks.0.attentions.1.transformer_blocks.0.norm2.bias", "up_blocks.0.attentions.1.transformer_blocks.0.norm2.weight", "up_blocks.0.attentions.1.transformer_blocks.0.norm3.bias", "up_blocks.0.attentions.1.transformer_blocks.0.norm3.weight", "up_blocks.0.attentions.1.transformer_blocks.1.attn1.to_k.weight", "up_blocks.0.attentions.1.transformer_blocks.1.attn1.to_out.0.bias", "up_blocks.0.attentions.1.transformer_blocks.1.attn1.to_out.0.weight", "up_blocks.0.attentions.1.transformer_blocks.1.attn1.to_q.weight", "up_blocks.0.attentions.1.transformer_blocks.1.attn1.to_v.weight", "up_blocks.0.attentions.1.transformer_blocks.1.attn2.to_k.weight", "up_blocks.0.attentions.1.transformer_blocks.1.attn2.to_out.0.bias", "up_blocks.0.attentions.1.transformer_blocks.1.attn2.to_out.0.weight", "up_blocks.0.attentions.1.transformer_blocks.1.attn2.to_q.weight", "up_blocks.0.attentions.1.transformer_blocks.1.attn2.to_v.weight", "up_blocks.0.attentions.1.transformer_blocks.1.ff.net.0.proj.bias", "up_blocks.0.attentions.1.transformer_blocks.1.ff.net.0.proj.weight", "up_blocks.0.attentions.1.transformer_blocks.1.ff.net.2.bias", "up_blocks.0.attentions.1.transformer_blocks.1.ff.net.2.weight", "up_blocks.0.attentions.1.transformer_blocks.1.norm1.bias", "up_blocks.0.attentions.1.transformer_blocks.1.norm1.weight", "up_blocks.0.attentions.1.transformer_blocks.1.norm2.bias", "up_blocks.0.attentions.1.transformer_blocks.1.norm2.weight", "up_blocks.0.attentions.1.transformer_blocks.1.norm3.bias", "up_blocks.0.attentions.1.transformer_blocks.1.norm3.weight", "up_blocks.0.attentions.1.transformer_blocks.2.attn1.to_k.weight", "up_blocks.0.attentions.1.transformer_blocks.2.attn1.to_out.0.bias", "up_blocks.0.attentions.1.transformer_blocks.2.attn1.to_out.0.weight", "up_blocks.0.attentions.1.transformer_blocks.2.attn1.to_q.weight", "up_blocks.0.attentions.1.transformer_blocks.2.attn1.to_v.weight", "up_blocks.0.attentions.1.transformer_blocks.2.attn2.to_k.weight", "up_blocks.0.attentions.1.transformer_blocks.2.attn2.to_out.0.bias", "up_blocks.0.attentions.1.transformer_blocks.2.attn2.to_out.0.weight", "up_blocks.0.attentions.1.transformer_blocks.2.attn2.to_q.weight", "up_blocks.0.attentions.1.transformer_blocks.2.attn2.to_v.weight", "up_blocks.0.attentions.1.transformer_blocks.2.ff.net.0.proj.bias", "up_blocks.0.attentions.1.transformer_blocks.2.ff.net.0.proj.weight", "up_blocks.0.attentions.1.transformer_blocks.2.ff.net.2.bias", "up_blocks.0.attentions.1.transformer_blocks.2.ff.net.2.weight", "up_blocks.0.attentions.1.transformer_blocks.2.norm1.bias", "up_blocks.0.attentions.1.transformer_blocks.2.norm1.weight", "up_blocks.0.attentions.1.transformer_blocks.2.norm2.bias", "up_blocks.0.attentions.1.transformer_blocks.2.norm2.weight", "up_blocks.0.attentions.1.transformer_blocks.2.norm3.bias", "up_blocks.0.attentions.1.transformer_blocks.2.norm3.weight", "up_blocks.0.attentions.1.transformer_blocks.3.attn1.to_k.weight", "up_blocks.0.attentions.1.transformer_blocks.3.attn1.to_out.0.bias", "up_blocks.0.attentions.1.transformer_blocks.3.attn1.to_out.0.weight", "up_blocks.0.attentions.1.transformer_blocks.3.attn1.to_q.weight", "up_blocks.0.attentions.1.transformer_blocks.3.attn1.to_v.weight", "up_blocks.0.attentions.1.transformer_blocks.3.attn2.to_k.weight", "up_blocks.0.attentions.1.transformer_blocks.3.attn2.to_out.0.bias", "up_blocks.0.attentions.1.transformer_blocks.3.attn2.to_out.0.weight", "up_blocks.0.attentions.1.transformer_blocks.3.attn2.to_q.weight", "up_blocks.0.attentions.1.transformer_blocks.3.attn2.to_v.weight", "up_blocks.0.attentions.1.transformer_blocks.3.ff.net.0.proj.bias", "up_blocks.0.attentions.1.transformer_blocks.3.ff.net.0.proj.weight", "up_blocks.0.attentions.1.transformer_blocks.3.ff.net.2.bias", "up_blocks.0.attentions.1.transformer_blocks.3.ff.net.2.weight", "up_blocks.0.attentions.1.transformer_blocks.3.norm1.bias", "up_blocks.0.attentions.1.transformer_blocks.3.norm1.weight", "up_blocks.0.attentions.1.transformer_blocks.3.norm2.bias", "up_blocks.0.attentions.1.transformer_blocks.3.norm2.weight", "up_blocks.0.attentions.1.transformer_blocks.3.norm3.bias", "up_blocks.0.attentions.1.transformer_blocks.3.norm3.weight", "up_blocks.0.attentions.1.transformer_blocks.4.attn1.to_k.weight", "up_blocks.0.attentions.1.transformer_blocks.4.attn1.to_out.0.bias", "up_blocks.0.attentions.1.transformer_blocks.4.attn1.to_out.0.weight", "up_blocks.0.attentions.1.transformer_blocks.4.attn1.to_q.weight", "up_blocks.0.attentions.1.transformer_blocks.4.attn1.to_v.weight", "up_blocks.0.attentions.1.transformer_blocks.4.attn2.to_k.weight", "up_blocks.0.attentions.1.transformer_blocks.4.attn2.to_out.0.bias", "up_blocks.0.attentions.1.transformer_blocks.4.attn2.to_out.0.weight", "up_blocks.0.attentions.1.transformer_blocks.4.attn2.to_q.weight", "up_blocks.0.attentions.1.transformer_blocks.4.attn2.to_v.weight", "up_blocks.0.attentions.1.transformer_blocks.4.ff.net.0.proj.bias", "up_blocks.0.attentions.1.transformer_blocks.4.ff.net.0.proj.weight", "up_blocks.0.attentions.1.transformer_blocks.4.ff.net.2.bias", "up_blocks.0.attentions.1.transformer_blocks.4.ff.net.2.weight", "up_blocks.0.attentions.1.transformer_blocks.4.norm1.bias", "up_blocks.0.attentions.1.transformer_blocks.4.norm1.weight", "up_blocks.0.attentions.1.transformer_blocks.4.norm2.bias", "up_blocks.0.attentions.1.transformer_blocks.4.norm2.weight", "up_blocks.0.attentions.1.transformer_blocks.4.norm3.bias", "up_blocks.0.attentions.1.transformer_blocks.4.norm3.weight", "up_blocks.0.attentions.1.transformer_blocks.5.attn1.to_k.weight", "up_blocks.0.attentions.1.transformer_blocks.5.attn1.to_out.0.bias", "up_blocks.0.attentions.1.transformer_blocks.5.attn1.to_out.0.weight", "up_blocks.0.attentions.1.transformer_blocks.5.attn1.to_q.weight", "up_blocks.0.attentions.1.transformer_blocks.5.attn1.to_v.weight", "up_blocks.0.attentions.1.transformer_blocks.5.attn2.to_k.weight", "up_blocks.0.attentions.1.transformer_blocks.5.attn2.to_out.0.bias", "up_blocks.0.attentions.1.transformer_blocks.5.attn2.to_out.0.weight", "up_blocks.0.attentions.1.transformer_blocks.5.attn2.to_q.weight", "up_blocks.0.attentions.1.transformer_blocks.5.attn2.to_v.weight", "up_blocks.0.attentions.1.transformer_blocks.5.ff.net.0.proj.bias", "up_blocks.0.attentions.1.transformer_blocks.5.ff.net.0.proj.weight", "up_blocks.0.attentions.1.transformer_blocks.5.ff.net.2.bias", "up_blocks.0.attentions.1.transformer_blocks.5.ff.net.2.weight", "up_blocks.0.attentions.1.transformer_blocks.5.norm1.bias", "up_blocks.0.attentions.1.transformer_blocks.5.norm1.weight", "up_blocks.0.attentions.1.transformer_blocks.5.norm2.bias", "up_blocks.0.attentions.1.transformer_blocks.5.norm2.weight", "up_blocks.0.attentions.1.transformer_blocks.5.norm3.bias", "up_blocks.0.attentions.1.transformer_blocks.5.norm3.weight", "up_blocks.0.attentions.1.transformer_blocks.6.attn1.to_k.weight", "up_blocks.0.attentions.1.transformer_blocks.6.attn1.to_out.0.bias", "up_blocks.0.attentions.1.transformer_blocks.6.attn1.to_out.0.weight", "up_blocks.0.attentions.1.transformer_blocks.6.attn1.to_q.weight", "up_blocks.0.attentions.1.transformer_blocks.6.attn1.to_v.weight", "up_blocks.0.attentions.1.transformer_blocks.6.attn2.to_k.weight", "up_blocks.0.attentions.1.transformer_blocks.6.attn2.to_out.0.bias", "up_blocks.0.attentions.1.transformer_blocks.6.attn2.to_out.0.weight", "up_blocks.0.attentions.1.transformer_blocks.6.attn2.to_q.weight", "up_blocks.0.attentions.1.transformer_blocks.6.attn2.to_v.weight", "up_blocks.0.attentions.1.transformer_blocks.6.ff.net.0.proj.bias", "up_blocks.0.attentions.1.transformer_blocks.6.ff.net.0.proj.weight", "up_blocks.0.attentions.1.transformer_blocks.6.ff.net.2.bias", "up_blocks.0.attentions.1.transformer_blocks.6.ff.net.2.weight", "up_blocks.0.attentions.1.transformer_blocks.6.norm1.bias", "up_blocks.0.attentions.1.transformer_blocks.6.norm1.weight", "up_blocks.0.attentions.1.transformer_blocks.6.norm2.bias", "up_blocks.0.attentions.1.transformer_blocks.6.norm2.weight", "up_blocks.0.attentions.1.transformer_blocks.6.norm3.bias", "up_blocks.0.attentions.1.transformer_blocks.6.norm3.weight", "up_blocks.0.attentions.1.transformer_blocks.7.attn1.to_k.weight", "up_blocks.0.attentions.1.transformer_blocks.7.attn1.to_out.0.bias", "up_blocks.0.attentions.1.transformer_blocks.7.attn1.to_out.0.weight", "up_blocks.0.attentions.1.transformer_blocks.7.attn1.to_q.weight", "up_blocks.0.attentions.1.transformer_blocks.7.attn1.to_v.weight", "up_blocks.0.attentions.1.transformer_blocks.7.attn2.to_k.weight", "up_blocks.0.attentions.1.transformer_blocks.7.attn2.to_out.0.bias", "up_blocks.0.attentions.1.transformer_blocks.7.attn2.to_out.0.weight", "up_blocks.0.attentions.1.transformer_blocks.7.attn2.to_q.weight", "up_blocks.0.attentions.1.transformer_blocks.7.attn2.to_v.weight", "up_blocks.0.attentions.1.transformer_blocks.7.ff.net.0.proj.bias", "up_blocks.0.attentions.1.transformer_blocks.7.ff.net.0.proj.weight", "up_blocks.0.attentions.1.transformer_blocks.7.ff.net.2.bias", "up_blocks.0.attentions.1.transformer_blocks.7.ff.net.2.weight", "up_blocks.0.attentions.1.transformer_blocks.7.norm1.bias", "up_blocks.0.attentions.1.transformer_blocks.7.norm1.weight", "up_blocks.0.attentions.1.transformer_blocks.7.norm2.bias", "up_blocks.0.attentions.1.transformer_blocks.7.norm2.weight", "up_blocks.0.attentions.1.transformer_blocks.7.norm3.bias", "up_blocks.0.attentions.1.transformer_blocks.7.norm3.weight", "up_blocks.0.attentions.1.transformer_blocks.8.attn1.to_k.weight", "up_blocks.0.attentions.1.transformer_blocks.8.attn1.to_out.0.bias", "up_blocks.0.attentions.1.transformer_blocks.8.attn1.to_out.0.weight", "up_blocks.0.attentions.1.transformer_blocks.8.attn1.to_q.weight", "up_blocks.0.attentions.1.transformer_blocks.8.attn1.to_v.weight", "up_blocks.0.attentions.1.transformer_blocks.8.attn2.to_k.weight", "up_blocks.0.attentions.1.transformer_blocks.8.attn2.to_out.0.bias", "up_blocks.0.attentions.1.transformer_blocks.8.attn2.to_out.0.weight", "up_blocks.0.attentions.1.transformer_blocks.8.attn2.to_q.weight", "up_blocks.0.attentions.1.transformer_blocks.8.attn2.to_v.weight", "up_blocks.0.attentions.1.transformer_blocks.8.ff.net.0.proj.bias", "up_blocks.0.attentions.1.transformer_blocks.8.ff.net.0.proj.weight", "up_blocks.0.attentions.1.transformer_blocks.8.ff.net.2.bias", "up_blocks.0.attentions.1.transformer_blocks.8.ff.net.2.weight", "up_blocks.0.attentions.1.transformer_blocks.8.norm1.bias", "up_blocks.0.attentions.1.transformer_blocks.8.norm1.weight", "up_blocks.0.attentions.1.transformer_blocks.8.norm2.bias", "up_blocks.0.attentions.1.transformer_blocks.8.norm2.weight", "up_blocks.0.attentions.1.transformer_blocks.8.norm3.bias", "up_blocks.0.attentions.1.transformer_blocks.8.norm3.weight", "up_blocks.0.attentions.1.transformer_blocks.9.attn1.to_k.weight", "up_blocks.0.attentions.1.transformer_blocks.9.attn1.to_out.0.bias", "up_blocks.0.attentions.1.transformer_blocks.9.attn1.to_out.0.weight", "up_blocks.0.attentions.1.transformer_blocks.9.attn1.to_q.weight", "up_blocks.0.attentions.1.transformer_blocks.9.attn1.to_v.weight", "up_blocks.0.attentions.1.transformer_blocks.9.attn2.to_k.weight", "up_blocks.0.attentions.1.transformer_blocks.9.attn2.to_out.0.bias", "up_blocks.0.attentions.1.transformer_blocks.9.attn2.to_out.0.weight", "up_blocks.0.attentions.1.transformer_blocks.9.attn2.to_q.weight", "up_blocks.0.attentions.1.transformer_blocks.9.attn2.to_v.weight", "up_blocks.0.attentions.1.transformer_blocks.9.ff.net.0.proj.bias", "up_blocks.0.attentions.1.transformer_blocks.9.ff.net.0.proj.weight", "up_blocks.0.attentions.1.transformer_blocks.9.ff.net.2.bias", "up_blocks.0.attentions.1.transformer_blocks.9.ff.net.2.weight", "up_blocks.0.attentions.1.transformer_blocks.9.norm1.bias", "up_blocks.0.attentions.1.transformer_blocks.9.norm1.weight", "up_blocks.0.attentions.1.transformer_blocks.9.norm2.bias", "up_blocks.0.attentions.1.transformer_blocks.9.norm2.weight", "up_blocks.0.attentions.1.transformer_blocks.9.norm3.bias", "up_blocks.0.attentions.1.transformer_blocks.9.norm3.weight", "up_blocks.0.attentions.2.norm.bias", "up_blocks.0.attentions.2.norm.weight", "up_blocks.0.attentions.2.proj_in.bias", "up_blocks.0.attentions.2.proj_in.weight", "up_blocks.0.attentions.2.proj_out.bias", "up_blocks.0.attentions.2.proj_out.weight", "up_blocks.0.attentions.2.transformer_blocks.0.attn1.to_k.weight", "up_blocks.0.attentions.2.transformer_blocks.0.attn1.to_out.0.bias", "up_blocks.0.attentions.2.transformer_blocks.0.attn1.to_out.0.weight", "up_blocks.0.attentions.2.transformer_blocks.0.attn1.to_q.weight", "up_blocks.0.attentions.2.transformer_blocks.0.attn1.to_v.weight", "up_blocks.0.attentions.2.transformer_blocks.0.attn2.to_k.weight", "up_blocks.0.attentions.2.transformer_blocks.0.attn2.to_out.0.bias", "up_blocks.0.attentions.2.transformer_blocks.0.attn2.to_out.0.weight", "up_blocks.0.attentions.2.transformer_blocks.0.attn2.to_q.weight", "up_blocks.0.attentions.2.transformer_blocks.0.attn2.to_v.weight", "up_blocks.0.attentions.2.transformer_blocks.0.ff.net.0.proj.bias", "up_blocks.0.attentions.2.transformer_blocks.0.ff.net.0.proj.weight", "up_blocks.0.attentions.2.transformer_blocks.0.ff.net.2.bias", "up_blocks.0.attentions.2.transformer_blocks.0.ff.net.2.weight", "up_blocks.0.attentions.2.transformer_blocks.0.norm1.bias", "up_blocks.0.attentions.2.transformer_blocks.0.norm1.weight", "up_blocks.0.attentions.2.transformer_blocks.0.norm2.bias", "up_blocks.0.attentions.2.transformer_blocks.0.norm2.weight", "up_blocks.0.attentions.2.transformer_blocks.0.norm3.bias", "up_blocks.0.attentions.2.transformer_blocks.0.norm3.weight", "up_blocks.0.attentions.2.transformer_blocks.1.attn1.to_k.weight", "up_blocks.0.attentions.2.transformer_blocks.1.attn1.to_out.0.bias", "up_blocks.0.attentions.2.transformer_blocks.1.attn1.to_out.0.weight", "up_blocks.0.attentions.2.transformer_blocks.1.attn1.to_q.weight", "up_blocks.0.attentions.2.transformer_blocks.1.attn1.to_v.weight", "up_blocks.0.attentions.2.transformer_blocks.1.attn2.to_k.weight", "up_blocks.0.attentions.2.transformer_blocks.1.attn2.to_out.0.bias", "up_blocks.0.attentions.2.transformer_blocks.1.attn2.to_out.0.weight", "up_blocks.0.attentions.2.transformer_blocks.1.attn2.to_q.weight", "up_blocks.0.attentions.2.transformer_blocks.1.attn2.to_v.weight", "up_blocks.0.attentions.2.transformer_blocks.1.ff.net.0.proj.bias", "up_blocks.0.attentions.2.transformer_blocks.1.ff.net.0.proj.weight", "up_blocks.0.attentions.2.transformer_blocks.1.ff.net.2.bias", "up_blocks.0.attentions.2.transformer_blocks.1.ff.net.2.weight", "up_blocks.0.attentions.2.transformer_blocks.1.norm1.bias", "up_blocks.0.attentions.2.transformer_blocks.1.norm1.weight", "up_blocks.0.attentions.2.transformer_blocks.1.norm2.bias", "up_blocks.0.attentions.2.transformer_blocks.1.norm2.weight", "up_blocks.0.attentions.2.transformer_blocks.1.norm3.bias", "up_blocks.0.attentions.2.transformer_blocks.1.norm3.weight", "up_blocks.0.attentions.2.transformer_blocks.2.attn1.to_k.weight", "up_blocks.0.attentions.2.transformer_blocks.2.attn1.to_out.0.bias", "up_blocks.0.attentions.2.transformer_blocks.2.attn1.to_out.0.weight", "up_blocks.0.attentions.2.transformer_blocks.2.attn1.to_q.weight", "up_blocks.0.attentions.2.transformer_blocks.2.attn1.to_v.weight", "up_blocks.0.attentions.2.transformer_blocks.2.attn2.to_k.weight", "up_blocks.0.attentions.2.transformer_blocks.2.attn2.to_out.0.bias", "up_blocks.0.attentions.2.transformer_blocks.2.attn2.to_out.0.weight", "up_blocks.0.attentions.2.transformer_blocks.2.attn2.to_q.weight", "up_blocks.0.attentions.2.transformer_blocks.2.attn2.to_v.weight", "up_blocks.0.attentions.2.transformer_blocks.2.ff.net.0.proj.bias", "up_blocks.0.attentions.2.transformer_blocks.2.ff.net.0.proj.weight", "up_blocks.0.attentions.2.transformer_blocks.2.ff.net.2.bias", "up_blocks.0.attentions.2.transformer_blocks.2.ff.net.2.weight", "up_blocks.0.attentions.2.transformer_blocks.2.norm1.bias", "up_blocks.0.attentions.2.transformer_blocks.2.norm1.weight", "up_blocks.0.attentions.2.transformer_blocks.2.norm2.bias", "up_blocks.0.attentions.2.transformer_blocks.2.norm2.weight", "up_blocks.0.attentions.2.transformer_blocks.2.norm3.bias", "up_blocks.0.attentions.2.transformer_blocks.2.norm3.weight", "up_blocks.0.attentions.2.transformer_blocks.3.attn1.to_k.weight", "up_blocks.0.attentions.2.transformer_blocks.3.attn1.to_out.0.bias", "up_blocks.0.attentions.2.transformer_blocks.3.attn1.to_out.0.weight", "up_blocks.0.attentions.2.transformer_blocks.3.attn1.to_q.weight", "up_blocks.0.attentions.2.transformer_blocks.3.attn1.to_v.weight", "up_blocks.0.attentions.2.transformer_blocks.3.attn2.to_k.weight", "up_blocks.0.attentions.2.transformer_blocks.3.attn2.to_out.0.bias", "up_blocks.0.attentions.2.transformer_blocks.3.attn2.to_out.0.weight", "up_blocks.0.attentions.2.transformer_blocks.3.attn2.to_q.weight", "up_blocks.0.attentions.2.transformer_blocks.3.attn2.to_v.weight", "up_blocks.0.attentions.2.transformer_blocks.3.ff.net.0.proj.bias", "up_blocks.0.attentions.2.transformer_blocks.3.ff.net.0.proj.weight", "up_blocks.0.attentions.2.transformer_blocks.3.ff.net.2.bias", "up_blocks.0.attentions.2.transformer_blocks.3.ff.net.2.weight", "up_blocks.0.attentions.2.transformer_blocks.3.norm1.bias", "up_blocks.0.attentions.2.transformer_blocks.3.norm1.weight", "up_blocks.0.attentions.2.transformer_blocks.3.norm2.bias", "up_blocks.0.attentions.2.transformer_blocks.3.norm2.weight", "up_blocks.0.attentions.2.transformer_blocks.3.norm3.bias", "up_blocks.0.attentions.2.transformer_blocks.3.norm3.weight", "up_blocks.0.attentions.2.transformer_blocks.4.attn1.to_k.weight", "up_blocks.0.attentions.2.transformer_blocks.4.attn1.to_out.0.bias", "up_blocks.0.attentions.2.transformer_blocks.4.attn1.to_out.0.weight", "up_blocks.0.attentions.2.transformer_blocks.4.attn1.to_q.weight", "up_blocks.0.attentions.2.transformer_blocks.4.attn1.to_v.weight", "up_blocks.0.attentions.2.transformer_blocks.4.attn2.to_k.weight", "up_blocks.0.attentions.2.transformer_blocks.4.attn2.to_out.0.bias", "up_blocks.0.attentions.2.transformer_blocks.4.attn2.to_out.0.weight", "up_blocks.0.attentions.2.transformer_blocks.4.attn2.to_q.weight", "up_blocks.0.attentions.2.transformer_blocks.4.attn2.to_v.weight", "up_blocks.0.attentions.2.transformer_blocks.4.ff.net.0.proj.bias", "up_blocks.0.attentions.2.transformer_blocks.4.ff.net.0.proj.weight", "up_blocks.0.attentions.2.transformer_blocks.4.ff.net.2.bias", "up_blocks.0.attentions.2.transformer_blocks.4.ff.net.2.weight", "up_blocks.0.attentions.2.transformer_blocks.4.norm1.bias", "up_blocks.0.attentions.2.transformer_blocks.4.norm1.weight", "up_blocks.0.attentions.2.transformer_blocks.4.norm2.bias", "up_blocks.0.attentions.2.transformer_blocks.4.norm2.weight", "up_blocks.0.attentions.2.transformer_blocks.4.norm3.bias", "up_blocks.0.attentions.2.transformer_blocks.4.norm3.weight", "up_blocks.0.attentions.2.transformer_blocks.5.attn1.to_k.weight", "up_blocks.0.attentions.2.transformer_blocks.5.attn1.to_out.0.bias", "up_blocks.0.attentions.2.transformer_blocks.5.attn1.to_out.0.weight", "up_blocks.0.attentions.2.transformer_blocks.5.attn1.to_q.weight", "up_blocks.0.attentions.2.transformer_blocks.5.attn1.to_v.weight", "up_blocks.0.attentions.2.transformer_blocks.5.attn2.to_k.weight", "up_blocks.0.attentions.2.transformer_blocks.5.attn2.to_out.0.bias", "up_blocks.0.attentions.2.transformer_blocks.5.attn2.to_out.0.weight", "up_blocks.0.attentions.2.transformer_blocks.5.attn2.to_q.weight", "up_blocks.0.attentions.2.transformer_blocks.5.attn2.to_v.weight", "up_blocks.0.attentions.2.transformer_blocks.5.ff.net.0.proj.bias", "up_blocks.0.attentions.2.transformer_blocks.5.ff.net.0.proj.weight", "up_blocks.0.attentions.2.transformer_blocks.5.ff.net.2.bias", "up_blocks.0.attentions.2.transformer_blocks.5.ff.net.2.weight", "up_blocks.0.attentions.2.transformer_blocks.5.norm1.bias", "up_blocks.0.attentions.2.transformer_blocks.5.norm1.weight", "up_blocks.0.attentions.2.transformer_blocks.5.norm2.bias", "up_blocks.0.attentions.2.transformer_blocks.5.norm2.weight", "up_blocks.0.attentions.2.transformer_blocks.5.norm3.bias", "up_blocks.0.attentions.2.transformer_blocks.5.norm3.weight", "up_blocks.0.attentions.2.transformer_blocks.6.attn1.to_k.weight", "up_blocks.0.attentions.2.transformer_blocks.6.attn1.to_out.0.bias", "up_blocks.0.attentions.2.transformer_blocks.6.attn1.to_out.0.weight", "up_blocks.0.attentions.2.transformer_blocks.6.attn1.to_q.weight", "up_blocks.0.attentions.2.transformer_blocks.6.attn1.to_v.weight", "up_blocks.0.attentions.2.transformer_blocks.6.attn2.to_k.weight", "up_blocks.0.attentions.2.transformer_blocks.6.attn2.to_out.0.bias", "up_blocks.0.attentions.2.transformer_blocks.6.attn2.to_out.0.weight", "up_blocks.0.attentions.2.transformer_blocks.6.attn2.to_q.weight", "up_blocks.0.attentions.2.transformer_blocks.6.attn2.to_v.weight", "up_blocks.0.attentions.2.transformer_blocks.6.ff.net.0.proj.bias", "up_blocks.0.attentions.2.transformer_blocks.6.ff.net.0.proj.weight", "up_blocks.0.attentions.2.transformer_blocks.6.ff.net.2.bias", "up_blocks.0.attentions.2.transformer_blocks.6.ff.net.2.weight", "up_blocks.0.attentions.2.transformer_blocks.6.norm1.bias", "up_blocks.0.attentions.2.transformer_blocks.6.norm1.weight", "up_blocks.0.attentions.2.transformer_blocks.6.norm2.bias", "up_blocks.0.attentions.2.transformer_blocks.6.norm2.weight", "up_blocks.0.attentions.2.transformer_blocks.6.norm3.bias", "up_blocks.0.attentions.2.transformer_blocks.6.norm3.weight", "up_blocks.0.attentions.2.transformer_blocks.7.attn1.to_k.weight", "up_blocks.0.attentions.2.transformer_blocks.7.attn1.to_out.0.bias", "up_blocks.0.attentions.2.transformer_blocks.7.attn1.to_out.0.weight", "up_blocks.0.attentions.2.transformer_blocks.7.attn1.to_q.weight", "up_blocks.0.attentions.2.transformer_blocks.7.attn1.to_v.weight", "up_blocks.0.attentions.2.transformer_blocks.7.attn2.to_k.weight", "up_blocks.0.attentions.2.transformer_blocks.7.attn2.to_out.0.bias", "up_blocks.0.attentions.2.transformer_blocks.7.attn2.to_out.0.weight", "up_blocks.0.attentions.2.transformer_blocks.7.attn2.to_q.weight", "up_blocks.0.attentions.2.transformer_blocks.7.attn2.to_v.weight", "up_blocks.0.attentions.2.transformer_blocks.7.ff.net.0.proj.bias", "up_blocks.0.attentions.2.transformer_blocks.7.ff.net.0.proj.weight", "up_blocks.0.attentions.2.transformer_blocks.7.ff.net.2.bias", "up_blocks.0.attentions.2.transformer_blocks.7.ff.net.2.weight", "up_blocks.0.attentions.2.transformer_blocks.7.norm1.bias", "up_blocks.0.attentions.2.transformer_blocks.7.norm1.weight", "up_blocks.0.attentions.2.transformer_blocks.7.norm2.bias", "up_blocks.0.attentions.2.transformer_blocks.7.norm2.weight", "up_blocks.0.attentions.2.transformer_blocks.7.norm3.bias", "up_blocks.0.attentions.2.transformer_blocks.7.norm3.weight", "up_blocks.0.attentions.2.transformer_blocks.8.attn1.to_k.weight", "up_blocks.0.attentions.2.transformer_blocks.8.attn1.to_out.0.bias", "up_blocks.0.attentions.2.transformer_blocks.8.attn1.to_out.0.weight", "up_blocks.0.attentions.2.transformer_blocks.8.attn1.to_q.weight", "up_blocks.0.attentions.2.transformer_blocks.8.attn1.to_v.weight", "up_blocks.0.attentions.2.transformer_blocks.8.attn2.to_k.weight", "up_blocks.0.attentions.2.transformer_blocks.8.attn2.to_out.0.bias", "up_blocks.0.attentions.2.transformer_blocks.8.attn2.to_out.0.weight", "up_blocks.0.attentions.2.transformer_blocks.8.attn2.to_q.weight", "up_blocks.0.attentions.2.transformer_blocks.8.attn2.to_v.weight", "up_blocks.0.attentions.2.transformer_blocks.8.ff.net.0.proj.bias", "up_blocks.0.attentions.2.transformer_blocks.8.ff.net.0.proj.weight", "up_blocks.0.attentions.2.transformer_blocks.8.ff.net.2.bias", "up_blocks.0.attentions.2.transformer_blocks.8.ff.net.2.weight", "up_blocks.0.attentions.2.transformer_blocks.8.norm1.bias", "up_blocks.0.attentions.2.transformer_blocks.8.norm1.weight", "up_blocks.0.attentions.2.transformer_blocks.8.norm2.bias", "up_blocks.0.attentions.2.transformer_blocks.8.norm2.weight", "up_blocks.0.attentions.2.transformer_blocks.8.norm3.bias", "up_blocks.0.attentions.2.transformer_blocks.8.norm3.weight", "up_blocks.0.attentions.2.transformer_blocks.9.attn1.to_k.weight", "up_blocks.0.attentions.2.transformer_blocks.9.attn1.to_out.0.bias", "up_blocks.0.attentions.2.transformer_blocks.9.attn1.to_out.0.weight", "up_blocks.0.attentions.2.transformer_blocks.9.attn1.to_q.weight", "up_blocks.0.attentions.2.transformer_blocks.9.attn1.to_v.weight", "up_blocks.0.attentions.2.transformer_blocks.9.attn2.to_k.weight", "up_blocks.0.attentions.2.transformer_blocks.9.attn2.to_out.0.bias", "up_blocks.0.attentions.2.transformer_blocks.9.attn2.to_out.0.weight", "up_blocks.0.attentions.2.transformer_blocks.9.attn2.to_q.weight", "up_blocks.0.attentions.2.transformer_blocks.9.attn2.to_v.weight", "up_blocks.0.attentions.2.transformer_blocks.9.ff.net.0.proj.bias", "up_blocks.0.attentions.2.transformer_blocks.9.ff.net.0.proj.weight", "up_blocks.0.attentions.2.transformer_blocks.9.ff.net.2.bias", "up_blocks.0.attentions.2.transformer_blocks.9.ff.net.2.weight", "up_blocks.0.attentions.2.transformer_blocks.9.norm1.bias", "up_blocks.0.attentions.2.transformer_blocks.9.norm1.weight", "up_blocks.0.attentions.2.transformer_blocks.9.norm2.bias", "up_blocks.0.attentions.2.transformer_blocks.9.norm2.weight", "up_blocks.0.attentions.2.transformer_blocks.9.norm3.bias", "up_blocks.0.attentions.2.transformer_blocks.9.norm3.weight", "up_blocks.1.attentions.0.transformer_blocks.1.attn1.to_k.weight", "up_blocks.1.attentions.0.transformer_blocks.1.attn1.to_out.0.bias", "up_blocks.1.attentions.0.transformer_blocks.1.attn1.to_out.0.weight", "up_blocks.1.attentions.0.transformer_blocks.1.attn1.to_q.weight", "up_blocks.1.attentions.0.transformer_blocks.1.attn1.to_v.weight", "up_blocks.1.attentions.0.transformer_blocks.1.attn2.to_k.weight", "up_blocks.1.attentions.0.transformer_blocks.1.attn2.to_out.0.bias", "up_blocks.1.attentions.0.transformer_blocks.1.attn2.to_out.0.weight", "up_blocks.1.attentions.0.transformer_blocks.1.attn2.to_q.weight", "up_blocks.1.attentions.0.transformer_blocks.1.attn2.to_v.weight", "up_blocks.1.attentions.0.transformer_blocks.1.ff.net.0.proj.bias", "up_blocks.1.attentions.0.transformer_blocks.1.ff.net.0.proj.weight", "up_blocks.1.attentions.0.transformer_blocks.1.ff.net.2.bias", "up_blocks.1.attentions.0.transformer_blocks.1.ff.net.2.weight", "up_blocks.1.attentions.0.transformer_blocks.1.norm1.bias", "up_blocks.1.attentions.0.transformer_blocks.1.norm1.weight", "up_blocks.1.attentions.0.transformer_blocks.1.norm2.bias", "up_blocks.1.attentions.0.transformer_blocks.1.norm2.weight", "up_blocks.1.attentions.0.transformer_blocks.1.norm3.bias", "up_blocks.1.attentions.0.transformer_blocks.1.norm3.weight", "up_blocks.1.attentions.1.transformer_blocks.1.attn1.to_k.weight", "up_blocks.1.attentions.1.transformer_blocks.1.attn1.to_out.0.bias", "up_blocks.1.attentions.1.transformer_blocks.1.attn1.to_out.0.weight", "up_blocks.1.attentions.1.transformer_blocks.1.attn1.to_q.weight", "up_blocks.1.attentions.1.transformer_blocks.1.attn1.to_v.weight", "up_blocks.1.attentions.1.transformer_blocks.1.attn2.to_k.weight", "up_blocks.1.attentions.1.transformer_blocks.1.attn2.to_out.0.bias", "up_blocks.1.attentions.1.transformer_blocks.1.attn2.to_out.0.weight", "up_blocks.1.attentions.1.transformer_blocks.1.attn2.to_q.weight", "up_blocks.1.attentions.1.transformer_blocks.1.attn2.to_v.weight", "up_blocks.1.attentions.1.transformer_blocks.1.ff.net.0.proj.bias", "up_blocks.1.attentions.1.transformer_blocks.1.ff.net.0.proj.weight", "up_blocks.1.attentions.1.transformer_blocks.1.ff.net.2.bias", "up_blocks.1.attentions.1.transformer_blocks.1.ff.net.2.weight", "up_blocks.1.attentions.1.transformer_blocks.1.norm1.bias", "up_blocks.1.attentions.1.transformer_blocks.1.norm1.weight", "up_blocks.1.attentions.1.transformer_blocks.1.norm2.bias", "up_blocks.1.attentions.1.transformer_blocks.1.norm2.weight", "up_blocks.1.attentions.1.transformer_blocks.1.norm3.bias", "up_blocks.1.attentions.1.transformer_blocks.1.norm3.weight", "up_blocks.1.attentions.2.transformer_blocks.1.attn1.to_k.weight", "up_blocks.1.attentions.2.transformer_blocks.1.attn1.to_out.0.bias", "up_blocks.1.attentions.2.transformer_blocks.1.attn1.to_out.0.weight", "up_blocks.1.attentions.2.transformer_blocks.1.attn1.to_q.weight", "up_blocks.1.attentions.2.transformer_blocks.1.attn1.to_v.weight", "up_blocks.1.attentions.2.transformer_blocks.1.attn2.to_k.weight", "up_blocks.1.attentions.2.transformer_blocks.1.attn2.to_out.0.bias", "up_blocks.1.attentions.2.transformer_blocks.1.attn2.to_out.0.weight", "up_blocks.1.attentions.2.transformer_blocks.1.attn2.to_q.weight", "up_blocks.1.attentions.2.transformer_blocks.1.attn2.to_v.weight", "up_blocks.1.attentions.2.transformer_blocks.1.ff.net.0.proj.bias", "up_blocks.1.attentions.2.transformer_blocks.1.ff.net.0.proj.weight", "up_blocks.1.attentions.2.transformer_blocks.1.ff.net.2.bias", "up_blocks.1.attentions.2.transformer_blocks.1.ff.net.2.weight", "up_blocks.1.attentions.2.transformer_blocks.1.norm1.bias", "up_blocks.1.attentions.2.transformer_blocks.1.norm1.weight", "up_blocks.1.attentions.2.transformer_blocks.1.norm2.bias", "up_blocks.1.attentions.2.transformer_blocks.1.norm2.weight", "up_blocks.1.attentions.2.transformer_blocks.1.norm3.bias", "up_blocks.1.attentions.2.transformer_blocks.1.norm3.weight", "mid_block.attentions.0.transformer_blocks.1.attn1.to_k.weight", "mid_block.attentions.0.transformer_blocks.1.attn1.to_out.0.bias", "mid_block.attentions.0.transformer_blocks.1.attn1.to_out.0.weight", "mid_block.attentions.0.transformer_blocks.1.attn1.to_q.weight", "mid_block.attentions.0.transformer_blocks.1.attn1.to_v.weight", "mid_block.attentions.0.transformer_blocks.1.attn2.to_k.weight", "mid_block.attentions.0.transformer_blocks.1.attn2.to_out.0.bias", "mid_block.attentions.0.transformer_blocks.1.attn2.to_out.0.weight", "mid_block.attentions.0.transformer_blocks.1.attn2.to_q.weight", "mid_block.attentions.0.transformer_blocks.1.attn2.to_v.weight", "mid_block.attentions.0.transformer_blocks.1.ff.net.0.proj.bias", "mid_block.attentions.0.transformer_blocks.1.ff.net.0.proj.weight", "mid_block.attentions.0.transformer_blocks.1.ff.net.2.bias", "mid_block.attentions.0.transformer_blocks.1.ff.net.2.weight", "mid_block.attentions.0.transformer_blocks.1.norm1.bias", "mid_block.attentions.0.transformer_blocks.1.norm1.weight", "mid_block.attentions.0.transformer_blocks.1.norm2.bias", "mid_block.attentions.0.transformer_blocks.1.norm2.weight", "mid_block.attentions.0.transformer_blocks.1.norm3.bias", "mid_block.attentions.0.transformer_blocks.1.norm3.weight", "mid_block.attentions.0.transformer_blocks.2.attn1.to_k.weight", "mid_block.attentions.0.transformer_blocks.2.attn1.to_out.0.bias", "mid_block.attentions.0.transformer_blocks.2.attn1.to_out.0.weight", "mid_block.attentions.0.transformer_blocks.2.attn1.to_q.weight", "mid_block.attentions.0.transformer_blocks.2.attn1.to_v.weight", "mid_block.attentions.0.transformer_blocks.2.attn2.to_k.weight", "mid_block.attentions.0.transformer_blocks.2.attn2.to_out.0.bias", "mid_block.attentions.0.transformer_blocks.2.attn2.to_out.0.weight", "mid_block.attentions.0.transformer_blocks.2.attn2.to_q.weight", "mid_block.attentions.0.transformer_blocks.2.attn2.to_v.weight", "mid_block.attentions.0.transformer_blocks.2.ff.net.0.proj.bias", "mid_block.attentions.0.transformer_blocks.2.ff.net.0.proj.weight", "mid_block.attentions.0.transformer_blocks.2.ff.net.2.bias", "mid_block.attentions.0.transformer_blocks.2.ff.net.2.weight", "mid_block.attentions.0.transformer_blocks.2.norm1.bias", "mid_block.attentions.0.transformer_blocks.2.norm1.weight", "mid_block.attentions.0.transformer_blocks.2.norm2.bias", "mid_block.attentions.0.transformer_blocks.2.norm2.weight", "mid_block.attentions.0.transformer_blocks.2.norm3.bias", "mid_block.attentions.0.transformer_blocks.2.norm3.weight", "mid_block.attentions.0.transformer_blocks.3.attn1.to_k.weight", "mid_block.attentions.0.transformer_blocks.3.attn1.to_out.0.bias", "mid_block.attentions.0.transformer_blocks.3.attn1.to_out.0.weight", "mid_block.attentions.0.transformer_blocks.3.attn1.to_q.weight", "mid_block.attentions.0.transformer_blocks.3.attn1.to_v.weight", "mid_block.attentions.0.transformer_blocks.3.attn2.to_k.weight", "mid_block.attentions.0.transformer_blocks.3.attn2.to_out.0.bias", "mid_block.attentions.0.transformer_blocks.3.attn2.to_out.0.weight", "mid_block.attentions.0.transformer_blocks.3.attn2.to_q.weight", "mid_block.attentions.0.transformer_blocks.3.attn2.to_v.weight", "mid_block.attentions.0.transformer_blocks.3.ff.net.0.proj.bias", "mid_block.attentions.0.transformer_blocks.3.ff.net.0.proj.weight", "mid_block.attentions.0.transformer_blocks.3.ff.net.2.bias", "mid_block.attentions.0.transformer_blocks.3.ff.net.2.weight", "mid_block.attentions.0.transformer_blocks.3.norm1.bias", "mid_block.attentions.0.transformer_blocks.3.norm1.weight", "mid_block.attentions.0.transformer_blocks.3.norm2.bias", "mid_block.attentions.0.transformer_blocks.3.norm2.weight", "mid_block.attentions.0.transformer_blocks.3.norm3.bias", "mid_block.attentions.0.transformer_blocks.3.norm3.weight", "mid_block.attentions.0.transformer_blocks.4.attn1.to_k.weight", "mid_block.attentions.0.transformer_blocks.4.attn1.to_out.0.bias", "mid_block.attentions.0.transformer_blocks.4.attn1.to_out.0.weight", "mid_block.attentions.0.transformer_blocks.4.attn1.to_q.weight", "mid_block.attentions.0.transformer_blocks.4.attn1.to_v.weight", "mid_block.attentions.0.transformer_blocks.4.attn2.to_k.weight", "mid_block.attentions.0.transformer_blocks.4.attn2.to_out.0.bias", "mid_block.attentions.0.transformer_blocks.4.attn2.to_out.0.weight", "mid_block.attentions.0.transformer_blocks.4.attn2.to_q.weight", "mid_block.attentions.0.transformer_blocks.4.attn2.to_v.weight", "mid_block.attentions.0.transformer_blocks.4.ff.net.0.proj.bias", "mid_block.attentions.0.transformer_blocks.4.ff.net.0.proj.weight", "mid_block.attentions.0.transformer_blocks.4.ff.net.2.bias", "mid_block.attentions.0.transformer_blocks.4.ff.net.2.weight", "mid_block.attentions.0.transformer_blocks.4.norm1.bias", "mid_block.attentions.0.transformer_blocks.4.norm1.weight", "mid_block.attentions.0.transformer_blocks.4.norm2.bias", "mid_block.attentions.0.transformer_blocks.4.norm2.weight", "mid_block.attentions.0.transformer_blocks.4.norm3.bias", "mid_block.attentions.0.transformer_blocks.4.norm3.weight", "mid_block.attentions.0.transformer_blocks.5.attn1.to_k.weight", "mid_block.attentions.0.transformer_blocks.5.attn1.to_out.0.bias", "mid_block.attentions.0.transformer_blocks.5.attn1.to_out.0.weight", "mid_block.attentions.0.transformer_blocks.5.attn1.to_q.weight", "mid_block.attentions.0.transformer_blocks.5.attn1.to_v.weight", "mid_block.attentions.0.transformer_blocks.5.attn2.to_k.weight", "mid_block.attentions.0.transformer_blocks.5.attn2.to_out.0.bias", "mid_block.attentions.0.transformer_blocks.5.attn2.to_out.0.weight", "mid_block.attentions.0.transformer_blocks.5.attn2.to_q.weight", "mid_block.attentions.0.transformer_blocks.5.attn2.to_v.weight", "mid_block.attentions.0.transformer_blocks.5.ff.net.0.proj.bias", "mid_block.attentions.0.transformer_blocks.5.ff.net.0.proj.weight", "mid_block.attentions.0.transformer_blocks.5.ff.net.2.bias", "mid_block.attentions.0.transformer_blocks.5.ff.net.2.weight", "mid_block.attentions.0.transformer_blocks.5.norm1.bias", "mid_block.attentions.0.transformer_blocks.5.norm1.weight", "mid_block.attentions.0.transformer_blocks.5.norm2.bias", "mid_block.attentions.0.transformer_blocks.5.norm2.weight", "mid_block.attentions.0.transformer_blocks.5.norm3.bias", "mid_block.attentions.0.transformer_blocks.5.norm3.weight", "mid_block.attentions.0.transformer_blocks.6.attn1.to_k.weight", "mid_block.attentions.0.transformer_blocks.6.attn1.to_out.0.bias", "mid_block.attentions.0.transformer_blocks.6.attn1.to_out.0.weight", "mid_block.attentions.0.transformer_blocks.6.attn1.to_q.weight", "mid_block.attentions.0.transformer_blocks.6.attn1.to_v.weight", "mid_block.attentions.0.transformer_blocks.6.attn2.to_k.weight", "mid_block.attentions.0.transformer_blocks.6.attn2.to_out.0.bias", "mid_block.attentions.0.transformer_blocks.6.attn2.to_out.0.weight", "mid_block.attentions.0.transformer_blocks.6.attn2.to_q.weight", "mid_block.attentions.0.transformer_blocks.6.attn2.to_v.weight", "mid_block.attentions.0.transformer_blocks.6.ff.net.0.proj.bias", "mid_block.attentions.0.transformer_blocks.6.ff.net.0.proj.weight", "mid_block.attentions.0.transformer_blocks.6.ff.net.2.bias", "mid_block.attentions.0.transformer_blocks.6.ff.net.2.weight", "mid_block.attentions.0.transformer_blocks.6.norm1.bias", "mid_block.attentions.0.transformer_blocks.6.norm1.weight", "mid_block.attentions.0.transformer_blocks.6.norm2.bias", "mid_block.attentions.0.transformer_blocks.6.norm2.weight", "mid_block.attentions.0.transformer_blocks.6.norm3.bias", "mid_block.attentions.0.transformer_blocks.6.norm3.weight", "mid_block.attentions.0.transformer_blocks.7.attn1.to_k.weight", "mid_block.attentions.0.transformer_blocks.7.attn1.to_out.0.bias", "mid_block.attentions.0.transformer_blocks.7.attn1.to_out.0.weight", "mid_block.attentions.0.transformer_blocks.7.attn1.to_q.weight", "mid_block.attentions.0.transformer_blocks.7.attn1.to_v.weight", "mid_block.attentions.0.transformer_blocks.7.attn2.to_k.weight", "mid_block.attentions.0.transformer_blocks.7.attn2.to_out.0.bias", "mid_block.attentions.0.transformer_blocks.7.attn2.to_out.0.weight", "mid_block.attentions.0.transformer_blocks.7.attn2.to_q.weight", "mid_block.attentions.0.transformer_blocks.7.attn2.to_v.weight", "mid_block.attentions.0.transformer_blocks.7.ff.net.0.proj.bias", "mid_block.attentions.0.transformer_blocks.7.ff.net.0.proj.weight", "mid_block.attentions.0.transformer_blocks.7.ff.net.2.bias", "mid_block.attentions.0.transformer_blocks.7.ff.net.2.weight", "mid_block.attentions.0.transformer_blocks.7.norm1.bias", "mid_block.attentions.0.transformer_blocks.7.norm1.weight", "mid_block.attentions.0.transformer_blocks.7.norm2.bias", "mid_block.attentions.0.transformer_blocks.7.norm2.weight", "mid_block.attentions.0.transformer_blocks.7.norm3.bias", "mid_block.attentions.0.transformer_blocks.7.norm3.weight", "mid_block.attentions.0.transformer_blocks.8.attn1.to_k.weight", "mid_block.attentions.0.transformer_blocks.8.attn1.to_out.0.bias", "mid_block.attentions.0.transformer_blocks.8.attn1.to_out.0.weight", "mid_block.attentions.0.transformer_blocks.8.attn1.to_q.weight", "mid_block.attentions.0.transformer_blocks.8.attn1.to_v.weight", "mid_block.attentions.0.transformer_blocks.8.attn2.to_k.weight", "mid_block.attentions.0.transformer_blocks.8.attn2.to_out.0.bias", "mid_block.attentions.0.transformer_blocks.8.attn2.to_out.0.weight", "mid_block.attentions.0.transformer_blocks.8.attn2.to_q.weight", "mid_block.attentions.0.transformer_blocks.8.attn2.to_v.weight", "mid_block.attentions.0.transformer_blocks.8.ff.net.0.proj.bias", "mid_block.attentions.0.transformer_blocks.8.ff.net.0.proj.weight", "mid_block.attentions.0.transformer_blocks.8.ff.net.2.bias", "mid_block.attentions.0.transformer_blocks.8.ff.net.2.weight", "mid_block.attentions.0.transformer_blocks.8.norm1.bias", "mid_block.attentions.0.transformer_blocks.8.norm1.weight", "mid_block.attentions.0.transformer_blocks.8.norm2.bias", "mid_block.attentions.0.transformer_blocks.8.norm2.weight", "mid_block.attentions.0.transformer_blocks.8.norm3.bias", "mid_block.attentions.0.transformer_blocks.8.norm3.weight", "mid_block.attentions.0.transformer_blocks.9.attn1.to_k.weight", "mid_block.attentions.0.transformer_blocks.9.attn1.to_out.0.bias", "mid_block.attentions.0.transformer_blocks.9.attn1.to_out.0.weight", "mid_block.attentions.0.transformer_blocks.9.attn1.to_q.weight", "mid_block.attentions.0.transformer_blocks.9.attn1.to_v.weight", "mid_block.attentions.0.transformer_blocks.9.attn2.to_k.weight", "mid_block.attentions.0.transformer_blocks.9.attn2.to_out.0.bias", "mid_block.attentions.0.transformer_blocks.9.attn2.to_out.0.weight", "mid_block.attentions.0.transformer_blocks.9.attn2.to_q.weight", "mid_block.attentions.0.transformer_blocks.9.attn2.to_v.weight", "mid_block.attentions.0.transformer_blocks.9.ff.net.0.proj.bias", "mid_block.attentions.0.transformer_blocks.9.ff.net.0.proj.weight", "mid_block.attentions.0.transformer_blocks.9.ff.net.2.bias", "mid_block.attentions.0.transformer_blocks.9.ff.net.2.weight", "mid_block.attentions.0.transformer_blocks.9.norm1.bias", "mid_block.attentions.0.transformer_blocks.9.norm1.weight", "mid_block.attentions.0.transformer_blocks.9.norm2.bias", "mid_block.attentions.0.transformer_blocks.9.norm2.weight", "mid_block.attentions.0.transformer_blocks.9.norm3.bias", "mid_block.attentions.0.transformer_blocks.9.norm3.weight". size mismatch for down_blocks.1.attentions.0.proj_in.weight: copying a param with shape torch.Size([640, 640]) from checkpoint, the shape in current model is torch.Size([640, 640, 1, 1]). size mismatch for down_blocks.1.attentions.0.transformer_blocks.0.attn2.to_k.weight: copying a param with shape torch.Size([640, 2048]) from checkpoint, the shape in current model is torch.Size([640, 768]). size mismatch for down_blocks.1.attentions.0.transformer_blocks.0.attn2.to_v.weight: copying a param with shape torch.Size([640, 2048]) from checkpoint, the shape in current model is torch.Size([640, 768]). size mismatch for down_blocks.1.attentions.0.proj_out.weight: copying a param with shape torch.Size([640, 640]) from checkpoint, the shape in current model is torch.Size([640, 640, 1, 1]). size mismatch for down_blocks.1.attentions.1.proj_in.weight: copying a param with shape torch.Size([640, 640]) from checkpoint, the shape in current model is torch.Size([640, 640, 1, 1]). size mismatch for down_blocks.1.attentions.1.transformer_blocks.0.attn2.to_k.weight: copying a param with shape torch.Size([640, 2048]) from checkpoint, the shape in current model is torch.Size([640, 768]). size mismatch for down_blocks.1.attentions.1.transformer_blocks.0.attn2.to_v.weight: copying a param with shape torch.Size([640, 2048]) from checkpoint, the shape in current model is torch.Size([640, 768]). size mismatch for down_blocks.1.attentions.1.proj_out.weight: copying a param with shape torch.Size([640, 640]) from checkpoint, the shape in current model is torch.Size([640, 640, 1, 1]). size mismatch for down_blocks.2.attentions.0.proj_in.weight: copying a param with shape torch.Size([1280, 1280]) from checkpoint, the shape in current model is torch.Size([1280, 1280, 1, 1]). size mismatch for down_blocks.2.attentions.0.transformer_blocks.0.attn2.to_k.weight: copying a param with shape torch.Size([1280, 2048]) from checkpoint, the shape in current model is torch.Size([1280, 768]). size mismatch for down_blocks.2.attentions.0.transformer_blocks.0.attn2.to_v.weight: copying a param with shape torch.Size([1280, 2048]) from checkpoint, the shape in current model is torch.Size([1280, 768]). size mismatch for down_blocks.2.attentions.0.proj_out.weight: copying a param with shape torch.Size([1280, 1280]) from checkpoint, the shape in current model is torch.Size([1280, 1280, 1, 1]). size mismatch for down_blocks.2.attentions.1.proj_in.weight: copying a param with shape torch.Size([1280, 1280]) from checkpoint, the shape in current model is torch.Size([1280, 1280, 1, 1]). size mismatch for down_blocks.2.attentions.1.transformer_blocks.0.attn2.to_k.weight: copying a param with shape torch.Size([1280, 2048]) from checkpoint, the shape in current model is torch.Size([1280, 768]). size mismatch for down_blocks.2.attentions.1.transformer_blocks.0.attn2.to_v.weight: copying a param with shape torch.Size([1280, 2048]) from checkpoint, the shape in current model is torch.Size([1280, 768]). size mismatch for down_blocks.2.attentions.1.proj_out.weight: copying a param with shape torch.Size([1280, 1280]) from checkpoint, the shape in current model is torch.Size([1280, 1280, 1, 1]). size mismatch for up_blocks.0.resnets.2.norm1.weight: copying a param with shape torch.Size([1920]) from checkpoint, the shape in current model is torch.Size([2560]). size mismatch for up_blocks.0.resnets.2.norm1.bias: copying a param with shape torch.Size([1920]) from checkpoint, the shape in current model is torch.Size([2560]). size mismatch for up_blocks.0.resnets.2.conv1.weight: copying a param with shape torch.Size([1280, 1920, 3, 3]) from checkpoint, the shape in current model is torch.Size([1280, 2560, 3, 3]). size mismatch for up_blocks.0.resnets.2.conv_shortcut.weight: copying a param with shape torch.Size([1280, 1920, 1, 1]) from checkpoint, the shape in current model is torch.Size([1280, 2560, 1, 1]). size mismatch for up_blocks.1.attentions.0.norm.weight: copying a param with shape torch.Size([640]) from checkpoint, the shape in current model is torch.Size([1280]). size mismatch for up_blocks.1.attentions.0.norm.bias: copying a param with shape torch.Size([640]) from checkpoint, the shape in current model is torch.Size([1280]). size mismatch for up_blocks.1.attentions.0.proj_in.weight: copying a param with shape torch.Size([640, 640]) from checkpoint, the shape in current model is torch.Size([1280, 1280, 1, 1]). size mismatch for up_blocks.1.attentions.0.proj_in.bias: copying a param with shape torch.Size([640]) from checkpoint, the shape in current model is torch.Size([1280]). size mismatch for up_blocks.1.attentions.0.transformer_blocks.0.attn1.to_q.weight: copying a param with shape torch.Size([640, 640]) from checkpoint, the shape in current model is torch.Size([1280, 1280]). size mismatch for up_blocks.1.attentions.0.transformer_blocks.0.attn1.to_k.weight: copying a param with shape torch.Size([640, 640]) from checkpoint, the shape in current model is torch.Size([1280, 1280]). size mismatch for up_blocks.1.attentions.0.transformer_blocks.0.attn1.to_v.weight: copying a param with shape torch.Size([640, 640]) from checkpoint, the shape in current model is torch.Size([1280, 1280]). size mismatch for up_blocks.1.attentions.0.transformer_blocks.0.attn1.to_out.0.weight: copying a param with shape torch.Size([640, 640]) from checkpoint, the shape in current model is torch.Size([1280, 1280]). size mismatch for up_blocks.1.attentions.0.transformer_blocks.0.attn1.to_out.0.bias: copying a param with shape torch.Size([640]) from checkpoint, the shape in current model is torch.Size([1280]). size mismatch for up_blocks.1.attentions.0.transformer_blocks.0.ff.net.0.proj.weight: copying a param with shape torch.Size([5120, 640]) from checkpoint, the shape in current model is torch.Size([10240, 1280]). size mismatch for up_blocks.1.attentions.0.transformer_blocks.0.ff.net.0.proj.bias: copying a param with shape torch.Size([5120]) from checkpoint, the shape in current model is torch.Size([10240]). size mismatch for up_blocks.1.attentions.0.transformer_blocks.0.ff.net.2.weight: copying a param with shape torch.Size([640, 2560]) from checkpoint, the shape in current model is torch.Size([1280, 5120]). size mismatch for up_blocks.1.attentions.0.transformer_blocks.0.ff.net.2.bias: copying a param with shape torch.Size([640]) from checkpoint, the shape in current model is torch.Size([1280]). size mismatch for up_blocks.1.attentions.0.transformer_blocks.0.attn2.to_q.weight: copying a param with shape torch.Size([640, 640]) from checkpoint, the shape in current model is torch.Size([1280, 1280]). size mismatch for up_blocks.1.attentions.0.transformer_blocks.0.attn2.to_k.weight: copying a param with shape torch.Size([640, 2048]) from checkpoint, the shape in current model is torch.Size([1280, 768]). size mismatch for up_blocks.1.attentions.0.transformer_blocks.0.attn2.to_v.weight: copying a param with shape torch.Size([640, 2048]) from checkpoint, the shape in current model is torch.Size([1280, 768]). size mismatch for up_blocks.1.attentions.0.transformer_blocks.0.attn2.to_out.0.weight: copying a param with shape torch.Size([640, 640]) from checkpoint, the shape in current model is torch.Size([1280, 1280]). size mismatch for up_blocks.1.attentions.0.transformer_blocks.0.attn2.to_out.0.bias: copying a param with shape torch.Size([640]) from checkpoint, the shape in current model is torch.Size([1280]). size mismatch for up_blocks.1.attentions.0.transformer_blocks.0.norm1.weight: copying a param with shape torch.Size([640]) from checkpoint, the shape in current model is torch.Size([1280]). size mismatch for up_blocks.1.attentions.0.transformer_blocks.0.norm1.bias: copying a param with shape torch.Size([640]) from checkpoint, the shape in current model is torch.Size([1280]). size mismatch for up_blocks.1.attentions.0.transformer_blocks.0.norm2.weight: copying a param with shape torch.Size([640]) from checkpoint, the shape in current model is torch.Size([1280]). size mismatch for up_blocks.1.attentions.0.transformer_blocks.0.norm2.bias: copying a param with shape torch.Size([640]) from checkpoint, the shape in current model is torch.Size([1280]). size mismatch for up_blocks.1.attentions.0.transformer_blocks.0.norm3.weight: copying a param with shape torch.Size([640]) from checkpoint, the shape in current model is torch.Size([1280]). size mismatch for up_blocks.1.attentions.0.transformer_blocks.0.norm3.bias: copying a param with shape torch.Size([640]) from checkpoint, the shape in current model is torch.Size([1280]). size mismatch for up_blocks.1.attentions.0.proj_out.weight: copying a param with shape torch.Size([640, 640]) from checkpoint, the shape in current model is torch.Size([1280, 1280, 1, 1]). size mismatch for up_blocks.1.attentions.0.proj_out.bias: copying a param with shape torch.Size([640]) from checkpoint, the shape in current model is torch.Size([1280]). size mismatch for up_blocks.1.attentions.1.norm.weight: copying a param with shape torch.Size([640]) from checkpoint, the shape in current model is torch.Size([1280]). size mismatch for up_blocks.1.attentions.1.norm.bias: copying a param with shape torch.Size([640]) from checkpoint, the shape in current model is torch.Size([1280]). size mismatch for up_blocks.1.attentions.1.proj_in.weight: copying a param with shape torch.Size([640, 640]) from checkpoint, the shape in current model is torch.Size([1280, 1280, 1, 1]). size mismatch for up_blocks.1.attentions.1.proj_in.bias: copying a param with shape torch.Size([640]) from checkpoint, the shape in current model is torch.Size([1280]). size mismatch for up_blocks.1.attentions.1.transformer_blocks.0.attn1.to_q.weight: copying a param with shape torch.Size([640, 640]) from checkpoint, the shape in current model is torch.Size([1280, 1280]). size mismatch for up_blocks.1.attentions.1.transformer_blocks.0.attn1.to_k.weight: copying a param with shape torch.Size([640, 640]) from checkpoint, the shape in current model is torch.Size([1280, 1280]). size mismatch for up_blocks.1.attentions.1.transformer_blocks.0.attn1.to_v.weight: copying a param with shape torch.Size([640, 640]) from checkpoint, the shape in current model is torch.Size([1280, 1280]). size mismatch for up_blocks.1.attentions.1.transformer_blocks.0.attn1.to_out.0.weight: copying a param with shape torch.Size([640, 640]) from checkpoint, the shape in current model is torch.Size([1280, 1280]). size mismatch for up_blocks.1.attentions.1.transformer_blocks.0.attn1.to_out.0.bias: copying a param with shape torch.Size([640]) from checkpoint, the shape in current model is torch.Size([1280]). size mismatch for up_blocks.1.attentions.1.transformer_blocks.0.ff.net.0.proj.weight: copying a param with shape torch.Size([5120, 640]) from checkpoint, the shape in current model is torch.Size([10240, 1280]). size mismatch for up_blocks.1.attentions.1.transformer_blocks.0.ff.net.0.proj.bias: copying a param with shape torch.Size([5120]) from checkpoint, the shape in current model is torch.Size([10240]). size mismatch for up_blocks.1.attentions.1.transformer_blocks.0.ff.net.2.weight: copying a param with shape torch.Size([640, 2560]) from checkpoint, the shape in current model is torch.Size([1280, 5120]). size mismatch for up_blocks.1.attentions.1.transformer_blocks.0.ff.net.2.bias: copying a param with shape torch.Size([640]) from checkpoint, the shape in current model is torch.Size([1280]). size mismatch for up_blocks.1.attentions.1.transformer_blocks.0.attn2.to_q.weight: copying a param with shape torch.Size([640, 640]) from checkpoint, the shape in current model is torch.Size([1280, 1280]). size mismatch for up_blocks.1.attentions.1.transformer_blocks.0.attn2.to_k.weight: copying a param with shape torch.Size([640, 2048]) from checkpoint, the shape in current model is torch.Size([1280, 768]). size mismatch for up_blocks.1.attentions.1.transformer_blocks.0.attn2.to_v.weight: copying a param with shape torch.Size([640, 2048]) from checkpoint, the shape in current model is torch.Size([1280, 768]). size mismatch for up_blocks.1.attentions.1.transformer_blocks.0.attn2.to_out.0.weight: copying a param with shape torch.Size([640, 640]) from checkpoint, the shape in current model is torch.Size([1280, 1280]). size mismatch for up_blocks.1.attentions.1.transformer_blocks.0.attn2.to_out.0.bias: copying a param with shape torch.Size([640]) from checkpoint, the shape in current model is torch.Size([1280]). size mismatch for up_blocks.1.attentions.1.transformer_blocks.0.norm1.weight: copying a param with shape torch.Size([640]) from checkpoint, the shape in current model is torch.Size([1280]). size mismatch for up_blocks.1.attentions.1.transformer_blocks.0.norm1.bias: copying a param with shape torch.Size([640]) from checkpoint, the shape in current model is torch.Size([1280]). size mismatch for up_blocks.1.attentions.1.transformer_blocks.0.norm2.weight: copying a param with shape torch.Size([640]) from checkpoint, the shape in current model is torch.Size([1280]). size mismatch for up_blocks.1.attentions.1.transformer_blocks.0.norm2.bias: copying a param with shape torch.Size([640]) from checkpoint, the shape in current model is torch.Size([1280]). size mismatch for up_blocks.1.attentions.1.transformer_blocks.0.norm3.weight: copying a param with shape torch.Size([640]) from checkpoint, the shape in current model is torch.Size([1280]). size mismatch for up_blocks.1.attentions.1.transformer_blocks.0.norm3.bias: copying a param with shape torch.Size([640]) from checkpoint, the shape in current model is torch.Size([1280]). size mismatch for up_blocks.1.attentions.1.proj_out.weight: copying a param with shape torch.Size([640, 640]) from checkpoint, the shape in current model is torch.Size([1280, 1280, 1, 1]). size mismatch for up_blocks.1.attentions.1.proj_out.bias: copying a param with shape torch.Size([640]) from checkpoint, the shape in current model is torch.Size([1280]). size mismatch for up_blocks.1.attentions.2.norm.weight: copying a param with shape torch.Size([640]) from checkpoint, the shape in current model is torch.Size([1280]). size mismatch for up_blocks.1.attentions.2.norm.bias: copying a param with shape torch.Size([640]) from checkpoint, the shape in current model is torch.Size([1280]). size mismatch for up_blocks.1.attentions.2.proj_in.weight: copying a param with shape torch.Size([640, 640]) from checkpoint, the shape in current model is torch.Size([1280, 1280, 1, 1]). size mismatch for up_blocks.1.attentions.2.proj_in.bias: copying a param with shape torch.Size([640]) from checkpoint, the shape in current model is torch.Size([1280]). size mismatch for up_blocks.1.attentions.2.transformer_blocks.0.attn1.to_q.weight: copying a param with shape torch.Size([640, 640]) from checkpoint, the shape in current model is torch.Size([1280, 1280]). size mismatch for up_blocks.1.attentions.2.transformer_blocks.0.attn1.to_k.weight: copying a param with shape torch.Size([640, 640]) from checkpoint, the shape in current model is torch.Size([1280, 1280]). size mismatch for up_blocks.1.attentions.2.transformer_blocks.0.attn1.to_v.weight: copying a param with shape torch.Size([640, 640]) from checkpoint, the shape in current model is torch.Size([1280, 1280]). size mismatch for up_blocks.1.attentions.2.transformer_blocks.0.attn1.to_out.0.weight: copying a param with shape torch.Size([640, 640]) from checkpoint, the shape in current model is torch.Size([1280, 1280]). size mismatch for up_blocks.1.attentions.2.transformer_blocks.0.attn1.to_out.0.bias: copying a param with shape torch.Size([640]) from checkpoint, the shape in current model is torch.Size([1280]). size mismatch for up_blocks.1.attentions.2.transformer_blocks.0.ff.net.0.proj.weight: copying a param with shape torch.Size([5120, 640]) from checkpoint, the shape in current model is torch.Size([10240, 1280]). size mismatch for up_blocks.1.attentions.2.transformer_blocks.0.ff.net.0.proj.bias: copying a param with shape torch.Size([5120]) from checkpoint, the shape in current model is torch.Size([10240]). size mismatch for up_blocks.1.attentions.2.transformer_blocks.0.ff.net.2.weight: copying a param with shape torch.Size([640, 2560]) from checkpoint, the shape in current model is torch.Size([1280, 5120]). size mismatch for up_blocks.1.attentions.2.transformer_blocks.0.ff.net.2.bias: copying a param with shape torch.Size([640]) from checkpoint, the shape in current model is torch.Size([1280]). size mismatch for up_blocks.1.attentions.2.transformer_blocks.0.attn2.to_q.weight: copying a param with shape torch.Size([640, 640]) from checkpoint, the shape in current model is torch.Size([1280, 1280]). size mismatch for up_blocks.1.attentions.2.transformer_blocks.0.attn2.to_k.weight: copying a param with shape torch.Size([640, 2048]) from checkpoint, the shape in current model is torch.Size([1280, 768]). size mismatch for up_blocks.1.attentions.2.transformer_blocks.0.attn2.to_v.weight: copying a param with shape torch.Size([640, 2048]) from checkpoint, the shape in current model is torch.Size([1280, 768]). size mismatch for up_blocks.1.attentions.2.transformer_blocks.0.attn2.to_out.0.weight: copying a param with shape torch.Size([640, 640]) from checkpoint, the shape in current model is torch.Size([1280, 1280]). size mismatch for up_blocks.1.attentions.2.transformer_blocks.0.attn2.to_out.0.bias: copying a param with shape torch.Size([640]) from checkpoint, the shape in current model is torch.Size([1280]). size mismatch for up_blocks.1.attentions.2.transformer_blocks.0.norm1.weight: copying a param with shape torch.Size([640]) from checkpoint, the shape in current model is torch.Size([1280]). size mismatch for up_blocks.1.attentions.2.transformer_blocks.0.norm1.bias: copying a param with shape torch.Size([640]) from checkpoint, the shape in current model is torch.Size([1280]). size mismatch for up_blocks.1.attentions.2.transformer_blocks.0.norm2.weight: copying a param with shape torch.Size([640]) from checkpoint, the shape in current model is torch.Size([1280]). size mismatch for up_blocks.1.attentions.2.transformer_blocks.0.norm2.bias: copying a param with shape torch.Size([640]) from checkpoint, the shape in current model is torch.Size([1280]). size mismatch for up_blocks.1.attentions.2.transformer_blocks.0.norm3.weight: copying a param with shape torch.Size([640]) from checkpoint, the shape in current model is torch.Size([1280]). size mismatch for up_blocks.1.attentions.2.transformer_blocks.0.norm3.bias: copying a param with shape torch.Size([640]) from checkpoint, the shape in current model is torch.Size([1280]). size mismatch for up_blocks.1.attentions.2.proj_out.weight: copying a param with shape torch.Size([640, 640]) from checkpoint, the shape in current model is torch.Size([1280, 1280, 1, 1]). size mismatch for up_blocks.1.attentions.2.proj_out.bias: copying a param with shape torch.Size([640]) from checkpoint, the shape in current model is torch.Size([1280]). size mismatch for up_blocks.1.resnets.0.norm1.weight: copying a param with shape torch.Size([1920]) from checkpoint, the shape in current model is torch.Size([2560]). size mismatch for up_blocks.1.resnets.0.norm1.bias: copying a param with shape torch.Size([1920]) from checkpoint, the shape in current model is torch.Size([2560]). size mismatch for up_blocks.1.resnets.0.conv1.weight: copying a param with shape torch.Size([640, 1920, 3, 3]) from checkpoint, the shape in current model is torch.Size([1280, 2560, 3, 3]). size mismatch for up_blocks.1.resnets.0.conv1.bias: copying a param with shape torch.Size([640]) from checkpoint, the shape in current model is torch.Size([1280]). size mismatch for up_blocks.1.resnets.0.time_emb_proj.weight: copying a param with shape torch.Size([640, 1280]) from checkpoint, the shape in current model is torch.Size([1280, 1280]). size mismatch for up_blocks.1.resnets.0.time_emb_proj.bias: copying a param with shape torch.Size([640]) from checkpoint, the shape in current model is torch.Size([1280]). size mismatch for up_blocks.1.resnets.0.norm2.weight: copying a param with shape torch.Size([640]) from checkpoint, the shape in current model is torch.Size([1280]). size mismatch for up_blocks.1.resnets.0.norm2.bias: copying a param with shape torch.Size([640]) from checkpoint, the shape in current model is torch.Size([1280]). size mismatch for up_blocks.1.resnets.0.conv2.weight: copying a param with shape torch.Size([640, 640, 3, 3]) from checkpoint, the shape in current model is torch.Size([1280, 1280, 3, 3]). size mismatch for up_blocks.1.resnets.0.conv2.bias: copying a param with shape torch.Size([640]) from checkpoint, the shape in current model is torch.Size([1280]). size mismatch for up_blocks.1.resnets.0.conv_shortcut.weight: copying a param with shape torch.Size([640, 1920, 1, 1]) from checkpoint, the shape in current model is torch.Size([1280, 2560, 1, 1]). size mismatch for up_blocks.1.resnets.0.conv_shortcut.bias: copying a param with shape torch.Size([640]) from checkpoint, the shape in current model is torch.Size([1280]). size mismatch for up_blocks.1.resnets.1.norm1.weight: copying a param with shape torch.Size([1280]) from checkpoint, the shape in current model is torch.Size([2560]). size mismatch for up_blocks.1.resnets.1.norm1.bias: copying a param with shape torch.Size([1280]) from checkpoint, the shape in current model is torch.Size([2560]). size mismatch for up_blocks.1.resnets.1.conv1.weight: copying a param with shape torch.Size([640, 1280, 3, 3]) from checkpoint, the shape in current model is torch.Size([1280, 2560, 3, 3]). size mismatch for up_blocks.1.resnets.1.conv1.bias: copying a param with shape torch.Size([640]) from checkpoint, the shape in current model is torch.Size([1280]). size mismatch for up_blocks.1.resnets.1.time_emb_proj.weight: copying a param with shape torch.Size([640, 1280]) from checkpoint, the shape in current model is torch.Size([1280, 1280]). size mismatch for up_blocks.1.resnets.1.time_emb_proj.bias: copying a param with shape torch.Size([640]) from checkpoint, the shape in current model is torch.Size([1280]). size mismatch for up_blocks.1.resnets.1.norm2.weight: copying a param with shape torch.Size([640]) from checkpoint, the shape in current model is torch.Size([1280]). size mismatch for up_blocks.1.resnets.1.norm2.bias: copying a param with shape torch.Size([640]) from checkpoint, the shape in current model is torch.Size([1280]). size mismatch for up_blocks.1.resnets.1.conv2.weight: copying a param with shape torch.Size([640, 640, 3, 3]) from checkpoint, the shape in current model is torch.Size([1280, 1280, 3, 3]). size mismatch for up_blocks.1.resnets.1.conv2.bias: copying a param with shape torch.Size([640]) from checkpoint, the shape in current model is torch.Size([1280]). size mismatch for up_blocks.1.resnets.1.conv_shortcut.weight: copying a param with shape torch.Size([640, 1280, 1, 1]) from checkpoint, the shape in current model is torch.Size([1280, 2560, 1, 1]). size mismatch for up_blocks.1.resnets.1.conv_shortcut.bias: copying a param with shape torch.Size([640]) from checkpoint, the shape in current model is torch.Size([1280]). size mismatch for up_blocks.1.resnets.2.norm1.weight: copying a param with shape torch.Size([960]) from checkpoint, the shape in current model is torch.Size([1920]). size mismatch for up_blocks.1.resnets.2.norm1.bias: copying a param with shape torch.Size([960]) from checkpoint, the shape in current model is torch.Size([1920]). size mismatch for up_blocks.1.resnets.2.conv1.weight: copying a param with shape torch.Size([640, 960, 3, 3]) from checkpoint, the shape in current model is torch.Size([1280, 1920, 3, 3]). size mismatch for up_blocks.1.resnets.2.conv1.bias: copying a param with shape torch.Size([640]) from checkpoint, the shape in current model is torch.Size([1280]). size mismatch for up_blocks.1.resnets.2.time_emb_proj.weight: copying a param with shape torch.Size([640, 1280]) from checkpoint, the shape in current model is torch.Size([1280, 1280]). size mismatch for up_blocks.1.resnets.2.time_emb_proj.bias: copying a param with shape torch.Size([640]) from checkpoint, the shape in current model is torch.Size([1280]). size mismatch for up_blocks.1.resnets.2.norm2.weight: copying a param with shape torch.Size([640]) from checkpoint, the shape in current model is torch.Size([1280]). size mismatch for up_blocks.1.resnets.2.norm2.bias: copying a param with shape torch.Size([640]) from checkpoint, the shape in current model is torch.Size([1280]). size mismatch for up_blocks.1.resnets.2.conv2.weight: copying a param with shape torch.Size([640, 640, 3, 3]) from checkpoint, the shape in current model is torch.Size([1280, 1280, 3, 3]). size mismatch for up_blocks.1.resnets.2.conv2.bias: copying a param with shape torch.Size([640]) from checkpoint, the shape in current model is torch.Size([1280]). size mismatch for up_blocks.1.resnets.2.conv_shortcut.weight: copying a param with shape torch.Size([640, 960, 1, 1]) from checkpoint, the shape in current model is torch.Size([1280, 1920, 1, 1]). size mismatch for up_blocks.1.resnets.2.conv_shortcut.bias: copying a param with shape torch.Size([640]) from checkpoint, the shape in current model is torch.Size([1280]). size mismatch for up_blocks.1.upsamplers.0.conv.weight: copying a param with shape torch.Size([640, 640, 3, 3]) from checkpoint, the shape in current model is torch.Size([1280, 1280, 3, 3]). size mismatch for up_blocks.1.upsamplers.0.conv.bias: copying a param with shape torch.Size([640]) from checkpoint, the shape in current model is torch.Size([1280]). size mismatch for up_blocks.2.resnets.0.norm1.weight: copying a param with shape torch.Size([960]) from checkpoint, the shape in current model is torch.Size([1920]). size mismatch for up_blocks.2.resnets.0.norm1.bias: copying a param with shape torch.Size([960]) from checkpoint, the shape in current model is torch.Size([1920]). size mismatch for up_blocks.2.resnets.0.conv1.weight: copying a param with shape torch.Size([320, 960, 3, 3]) from checkpoint, the shape in current model is torch.Size([640, 1920, 3, 3]). size mismatch for up_blocks.2.resnets.0.conv1.bias: copying a param with shape torch.Size([320]) from checkpoint, the shape in current model is torch.Size([640]). size mismatch for up_blocks.2.resnets.0.time_emb_proj.weight: copying a param with shape torch.Size([320, 1280]) from checkpoint, the shape in current model is torch.Size([640, 1280]). size mismatch for up_blocks.2.resnets.0.time_emb_proj.bias: copying a param with shape torch.Size([320]) from checkpoint, the shape in current model is torch.Size([640]). size mismatch for up_blocks.2.resnets.0.norm2.weight: copying a param with shape torch.Size([320]) from checkpoint, the shape in current model is torch.Size([640]). size mismatch for up_blocks.2.resnets.0.norm2.bias: copying a param with shape torch.Size([320]) from checkpoint, the shape in current model is torch.Size([640]). size mismatch for up_blocks.2.resnets.0.conv2.weight: copying a param with shape torch.Size([320, 320, 3, 3]) from checkpoint, the shape in current model is torch.Size([640, 640, 3, 3]). size mismatch for up_blocks.2.resnets.0.conv2.bias: copying a param with shape torch.Size([320]) from checkpoint, the shape in current model is torch.Size([640]). size mismatch for up_blocks.2.resnets.0.conv_shortcut.weight: copying a param with shape torch.Size([320, 960, 1, 1]) from checkpoint, the shape in current model is torch.Size([640, 1920, 1, 1]). size mismatch for up_blocks.2.resnets.0.conv_shortcut.bias: copying a param with shape torch.Size([320]) from checkpoint, the shape in current model is torch.Size([640]). size mismatch for up_blocks.2.resnets.1.norm1.weight: copying a param with shape torch.Size([640]) from checkpoint, the shape in current model is torch.Size([1280]). size mismatch for up_blocks.2.resnets.1.norm1.bias: copying a param with shape torch.Size([640]) from checkpoint, the shape in current model is torch.Size([1280]). size mismatch for up_blocks.2.resnets.1.conv1.weight: copying a param with shape torch.Size([320, 640, 3, 3]) from checkpoint, the shape in current model is torch.Size([640, 1280, 3, 3]). size mismatch for up_blocks.2.resnets.1.conv1.bias: copying a param with shape torch.Size([320]) from checkpoint, the shape in current model is torch.Size([640]). size mismatch for up_blocks.2.resnets.1.time_emb_proj.weight: copying a param with shape torch.Size([320, 1280]) from checkpoint, the shape in current model is torch.Size([640, 1280]). size mismatch for up_blocks.2.resnets.1.time_emb_proj.bias: copying a param with shape torch.Size([320]) from checkpoint, the shape in current model is torch.Size([640]). size mismatch for up_blocks.2.resnets.1.norm2.weight: copying a param with shape torch.Size([320]) from checkpoint, the shape in current model is torch.Size([640]). size mismatch for up_blocks.2.resnets.1.norm2.bias: copying a param with shape torch.Size([320]) from checkpoint, the shape in current model is torch.Size([640]). size mismatch for up_blocks.2.resnets.1.conv2.weight: copying a param with shape torch.Size([320, 320, 3, 3]) from checkpoint, the shape in current model is torch.Size([640, 640, 3, 3]). size mismatch for up_blocks.2.resnets.1.conv2.bias: copying a param with shape torch.Size([320]) from checkpoint, the shape in current model is torch.Size([640]). size mismatch for up_blocks.2.resnets.1.conv_shortcut.weight: copying a param with shape torch.Size([320, 640, 1, 1]) from checkpoint, the shape in current model is torch.Size([640, 1280, 1, 1]). size mismatch for up_blocks.2.resnets.1.conv_shortcut.bias: copying a param with shape torch.Size([320]) from checkpoint, the shape in current model is torch.Size([640]). size mismatch for up_blocks.2.resnets.2.norm1.weight: copying a param with shape torch.Size([640]) from checkpoint, the shape in current model is torch.Size([960]). size mismatch for up_blocks.2.resnets.2.norm1.bias: copying a param with shape torch.Size([640]) from checkpoint, the shape in current model is torch.Size([960]). size mismatch for up_blocks.2.resnets.2.conv1.weight: copying a param with shape torch.Size([320, 640, 3, 3]) from checkpoint, the shape in current model is torch.Size([640, 960, 3, 3]). size mismatch for up_blocks.2.resnets.2.conv1.bias: copying a param with shape torch.Size([320]) from checkpoint, the shape in current model is torch.Size([640]). size mismatch for up_blocks.2.resnets.2.time_emb_proj.weight: copying a param with shape torch.Size([320, 1280]) from checkpoint, the shape in current model is torch.Size([640, 1280]). size mismatch for up_blocks.2.resnets.2.time_emb_proj.bias: copying a param with shape torch.Size([320]) from checkpoint, the shape in current model is torch.Size([640]). size mismatch for up_blocks.2.resnets.2.norm2.weight: copying a param with shape torch.Size([320]) from checkpoint, the shape in current model is torch.Size([640]). size mismatch for up_blocks.2.resnets.2.norm2.bias: copying a param with shape torch.Size([320]) from checkpoint, the shape in current model is torch.Size([640]). size mismatch for up_blocks.2.resnets.2.conv2.weight: copying a param with shape torch.Size([320, 320, 3, 3]) from checkpoint, the shape in current model is torch.Size([640, 640, 3, 3]). size mismatch for up_blocks.2.resnets.2.conv2.bias: copying a param with shape torch.Size([320]) from checkpoint, the shape in current model is torch.Size([640]). size mismatch for up_blocks.2.resnets.2.conv_shortcut.weight: copying a param with shape torch.Size([320, 640, 1, 1]) from checkpoint, the shape in current model is torch.Size([640, 960, 1, 1]). size mismatch for up_blocks.2.resnets.2.conv_shortcut.bias: copying a param with shape torch.Size([320]) from checkpoint, the shape in current model is torch.Size([640]). size mismatch for mid_block.attentions.0.proj_in.weight: copying a param with shape torch.Size([1280, 1280]) from checkpoint, the shape in current model is torch.Size([1280, 1280, 1, 1]). size mismatch for mid_block.attentions.0.transformer_blocks.0.attn2.to_k.weight: copying a param with shape torch.Size([1280, 2048]) from checkpoint, the shape in current model is torch.Size([1280, 768]). size mismatch for mid_block.attentions.0.transformer_blocks.0.attn2.to_v.weight: copying a param with shape torch.Size([1280, 2048]) from checkpoint, the shape in current model is torch.Size([1280, 768]). size mismatch for mid_block.attentions.0.proj_out.weight: copying a param with shape torch.Size([1280, 1280]) from checkpoint, the shape in current model is torch.Size([1280, 1280, 1, 1]). Traceback (most recent call last): File "C:\Users\josep\AppData\Local\Programs\Python\Python310\lib\runpy.py", line 196, in _run_module_as_main return _run_code(code, main_globals, None, File "C:\Users\josep\AppData\Local\Programs\Python\Python310\lib\runpy.py", line 86, in _run_code exec(code, run_globals) File "E:\GitHub\kohya_ss\venv\Scripts\accelerate.exemain.py", line 7, in File "E:\GitHub\kohya_ss\venv\lib\site-packages\accelerate\commands\accelerate_cli.py", line 47, in main args.func(args) File "E:\GitHub\kohya_ss\venv\lib\site-packages\accelerate\commands\launch.py", line 986, in launch_command simple_launcher(args) File "E:\GitHub\kohya_ss\venv\lib\site-packages\accelerate\commands\launch.py", line 628, in simple_launcher raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd) subprocess.CalledProcessError: Command '['E:\GitHub\kohya_ss\venv\Scripts\python.exe', './train_network.py', '--enable_bucket', '--min_bucket_reso=256', '--max_bucket_reso=2048', '--pretrained_model_name_or_path=E:/GitHub/stable-diffusion-webui/models/Stable-diffusion/sd_xl_base_1.0.safetensors', '--train_data_dir=E:\GitHub\training\Merial', '--resolution=512,512', '--output_dir=C:\Users\josep\Desktop\Lora', '--logging_dir=C:\Users\josep\Desktop\Lora', '--network_alpha=128', '--save_model_as=safetensors', '--network_module=networks.lora', '--text_encoder_lr=5e-05', '--unet_lr=0.00015', '--network_dim=128', '--output_name=Merial13', '--lr_scheduler_num_cycles=15', '--no_half_vae', '--learning_rate=0.00015', '--lr_scheduler=cosine', '--train_batch_size=2', '--max_train_steps=750', '--save_every_n_epochs=5', '--mixed_precision=bf16', '--save_precision=bf16', '--caption_extension=.txt', '--cache_latents', '--optimizer_type=AdamW8bit', '--max_grad_norm=1', '--max_data_loader_n_workers=0', '--max_token_length=225', '--clip_skip=2', '--bucket_reso_steps=64', '--xformers', '--bucket_no_upscale', '--noise_offset=0.0']' returned non-zero exit status 1.

An error when SDXL + AdamW8bit

The main issue I see with this one, is trying to train SDXL using the train_network.py. SDXL and previous Stable Diffusion models have a different architecture (SDXL has two text encoders, previous versions have one) that makes the scripts incompatible. And again, need to install bitsandbytes to use 8 bit optimizers. Not gonna troubleshoot much beyond that.

DKnight54 commented 10 months ago

Stacking multiple runs together is making it harder for me to figure out where to start and end.

Most of the time, the only section needed to figure out is this part with the line starting with "Traceback"

Traceback (most recent call last):
File "E:\GitHub\kohya_ss\sdxl_train_network.py", line 189, in
trainer.train(args)
File "E:\GitHub\kohya_ss\train_network.py", line 234, in train
model_version, text_encoder, vae, unet = self.load_target_model(args, weight_dtype, accelerator)
File "E:\GitHub\kohya_ss\sdxl_train_network.py", line 47, in load_target_model
) = sdxl_train_util.load_target_model(args, accelerator, sdxl_model_util.MODEL_VERSION_SDXL_BASE_V1_0, weight_dtype)
File "E:\GitHub\kohya_ss\library\sdxl_train_util.py", line 34, in load_target_model
) = _load_target_model(
File "E:\GitHub\kohya_ss\library\sdxl_train_util.py", line 103, in _load_target_model
if text_encoder2.dtype != torch.float32:
AttributeError: 'NoneType' object has no attribute 'dtype'
Traceback (most recent call last):
File "C:\Users\josep\AppData\Local\Programs\Python\Python310\lib\runpy.py", line 196, in _run_module_as_main
return _run_code(code, main_globals, None,
File "C:\Users\josep\AppData\Local\Programs\Python\Python310\lib\runpy.py", line 86, in run_code
exec(code, run_globals)
File "E:\GitHub\kohya_ss\venv\Scripts\accelerate.exe_main.py", line 7, in
File "E:\GitHub\kohya_ss\venv\lib\site-packages\accelerate\commands\accelerate_cli.py", line 47, in main
args.func(args)
File "E:\GitHub\kohya_ss\venv\lib\site-packages\accelerate\commands\launch.py", line 986, in launch_command
simple_launcher(args)
File "E:\GitHub\kohya_ss\venv\lib\site-packages\accelerate\commands\launch.py", line 628, in simple_launcher
raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd)
subprocess.CalledProcessError: Command '['E:\GitHub\kohya_ss\venv\Scripts\python.exe', './sdxl_train_network.py', '--enable_bucket', '--min_bucket_reso=256', '--max_bucket_reso=2048', '--pretrained_model_name_or_path=runwayml/stable-diffusion-v1-5', '--train_data_dir=E:\GitHub\training\Merial', '--resolution=1024,1024', '--output_dir=C:\Users\josep\Desktop\Lora', '--logging_dir=C:\Users\josep\Desktop\Lora', '--network_alpha=32', '--training_comment=3 repeats. More info: https://civitai.com/articles/1771', '--save_model_as=safetensors', '--network_module=networks.lora', '--text_encoder_lr=3e-05', '--unet_lr=3e-05', '--network_dim=32', '--output_name=Merial13', '--lr_scheduler_num_cycles=50', '--no_half_vae', '--learning_rate=3e-05', '--lr_scheduler=constant', '--train_batch_size=3', '--max_train_steps=1667', '--save_every_n_epochs=1', '--mixed_precision=fp16', '--save_precision=fp16', '--caption_extension=.txt', '--cache_latents', '--cache_latents_to_disk', '--optimizer_type=AdamW', '--max_grad_norm=1', '--max_train_epochs=50', '--max_data_loader_n_workers=0', '--caption_dropout_rate=0.05', '--bucket_reso_steps=64', '--min_snr_gamma=5', '--gradient_checkpointing', '--full_fp16', '--xformers', '--noise_offset=0.0']' returned non-zero exit status 1.

Issue with this is is again running Stable Diffusion 1.5 model on the SDXL training script

05:21:20-452682 INFO accelerate launch --num_cpu_threads_per_process=2 "./sdxl_train_network.py"
--enable_bucket --min_bucket_reso=256 --max_bucket_reso=2048
--pretrained_model_name_or_path="runwayml/stable-diffusion-v1-5"
--train_data_dir="E:\GitHub\training\Merial" --resolution="1024,1024"
--output_dir="C:\Users\josep\Desktop\Lora" --logging_dir="C:\Users\josep\Desktop\Lora"
--network_alpha="32" --training_comment="3 repeats. More info:
https://civitai.com/articles/1771" --save_model_as=safetensors
--network_module=networks.lora --text_encoder_lr=3e-05 --unet_lr=3e-05 --network_dim=32
--output_name="Merial13" --lr_scheduler_num_cycles="50" --no_half_vae
--learning_rate="3e-05" --lr_scheduler="constant" --train_batch_size="3"
--max_train_steps="1667" --save_every_n_epochs="1" --mixed_precision="fp16"
--save_precision="fp16" --caption_extension=".txt" --cache_latents
--cache_latents_to_disk --optimizer_type="AdamW" --max_grad_norm="1"
--max_train_epochs=50 --max_data_loader_n_workers="0" --caption_dropout_rate="0.05"
--bucket_reso_steps=64 --min_snr_gamma=5 --gradient_checkpointing --full_fp16 --xformers
--noise_offset=0.0
prepare tokenizers
Using DreamBooth method.
prepare images.
found directory E:\GitHub\training\Merial\10_Merial contains 10 image files
100 train images with repeating.
0 reg images.
no regularization images / 正則化画像が見つかりませんでした
[Dataset 0]
batch_size: 3
resolution: (1024, 1024)
enable_bucket: True
min_bucket_reso: 256
max_bucket_reso: 2048
bucket_reso_steps: 64
bucket_no_upscale: False

[Subset 0 of Dataset 0]
image_dir: "E:\GitHub\training\Merial\10_Merial"
image_count: 10
num_repeats: 10
shuffle_caption: False
keep_tokens: 0
keep_tokens_separator:
caption_dropout_rate: 0.05
caption_dropout_every_n_epoches: 0
caption_tag_dropout_rate: 0.0
caption_prefix: None
caption_suffix: None
color_aug: False
flip_aug: False
face_crop_aug_range: None
random_crop: False
token_warmup_min: 1,
token_warmup_step: 0,
is_reg: False
class_tokens: Merial
caption_extension: .txt

[Dataset 0]
loading image sizes.
100%|███████████████████████████████████████████████████████████████████████████| 10/10 [00:00<00:00, 145.96it/s]
make buckets
number of images (including repeats) / 各bucketの画像枚数（繰り返し回数を含む）
bucket 0: resolution (640, 1600), count: 20
bucket 1: resolution (704, 1408), count: 10
bucket 2: resolution (768, 1280), count: 20
bucket 3: resolution (832, 1216), count: 30
bucket 4: resolution (896, 1152), count: 10
bucket 5: resolution (1024, 1024), count: 10
mean ar error (without repeats): 0.02071550377769773
preparing accelerator
loading model for process 0/1
load Diffusers pretrained models: runwayml/stable-diffusion-v1-5, variant=fp16
Loading pipeline components...: 100%|██████████████████████████████████████████████| 5/5 [00:00<00:00, 7.13it/s]
Traceback (most recent call last):
File "E:\GitHub\kohya_ss\sdxl_train_network.py", line 189, in
trainer.train(args)
File "E:\GitHub\kohya_ss\train_network.py", line 234, in train
model_version, text_encoder, vae, unet = self.load_target_model(args, weight_dtype, accelerator)
File "E:\GitHub\kohya_ss\sdxl_train_network.py", line 47, in load_target_model
) = sdxl_train_util.load_target_model(args, accelerator, sdxl_model_util.MODEL_VERSION_SDXL_BASE_V1_0, weight_dtype)
File "E:\GitHub\kohya_ss\library\sdxl_train_util.py", line 34, in load_target_model
) = _load_target_model(
File "E:\GitHub\kohya_ss\library\sdxl_train_util.py", line 103, in _load_target_model
if text_encoder2.dtype != torch.float32:
AttributeError: 'NoneType' object has no attribute 'dtype'
Traceback (most recent call last):
File "C:\Users\josep\AppData\Local\Programs\Python\Python310\lib\runpy.py", line 196, in _run_module_as_main
return _run_code(code, main_globals, None,
File "C:\Users\josep\AppData\Local\Programs\Python\Python310\lib\runpy.py", line 86, in run_code
exec(code, run_globals)
File "E:\GitHub\kohya_ss\venv\Scripts\accelerate.exe_main.py", line 7, in
File "E:\GitHub\kohya_ss\venv\lib\site-packages\accelerate\commands\accelerate_cli.py", line 47, in main
args.func(args)
File "E:\GitHub\kohya_ss\venv\lib\site-packages\accelerate\commands\launch.py", line 986, in launch_command
simple_launcher(args)
File "E:\GitHub\kohya_ss\venv\lib\site-packages\accelerate\commands\launch.py", line 628, in simple_launcher
raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd)
subprocess.CalledProcessError: Command '['E:\GitHub\kohya_ss\venv\Scripts\python.exe', './sdxl_train_network.py', '--enable_bucket', '--min_bucket_reso=256', '--max_bucket_reso=2048', '--pretrained_model_name_or_path=runwayml/stable-diffusion-v1-5', '--train_data_dir=E:\GitHub\training\Merial', '--resolution=1024,1024', '--output_dir=C:\Users\josep\Desktop\Lora', '--logging_dir=C:\Users\josep\Desktop\Lora', '--network_alpha=32', '--training_comment=3 repeats. More info: https://civitai.com/articles/1771', '--save_model_as=safetensors', '--network_module=networks.lora', '--text_encoder_lr=3e-05', '--unet_lr=3e-05', '--network_dim=32', '--output_name=Merial13', '--lr_scheduler_num_cycles=50', '--no_half_vae', '--learning_rate=3e-05', '--lr_scheduler=constant', '--train_batch_size=3', '--max_train_steps=1667', '--save_every_n_epochs=1', '--mixed_precision=fp16', '--save_precision=fp16', '--caption_extension=.txt', '--cache_latents', '--cache_latents_to_disk', '--optimizer_type=AdamW', '--max_grad_norm=1', '--max_train_epochs=50', '--max_data_loader_n_workers=0', '--caption_dropout_rate=0.05', '--bucket_reso_steps=64', '--min_snr_gamma=5', '--gradient_checkpointing', '--full_fp16', '--xformers', '--noise_offset=0.0']' returned non-zero exit status 1.

kohya-ss / sd-scripts

returned non-zero exit status 1? Anyone know how and why this happens? #1041