[Issue]: Diffusers backend unable to correctly detect diffusers pipeline for some checkpoints

Muxropendiy commented 4 months ago

Issue Description

When Diffusers backend is selected and Diffusers pipeline is set to Autodetect, SD.Next can not load some checkpoints. For example: Aardvark 2024 Photography and LEOSAM's HelloWorld XL. With Original backend or if pipeline is manually set to "Stable Diffusion", everything works. Running UI in safe mode doesn't change anything. There is another issue where this is mentioned: AMD GPU (RX 7900 XT) on Ubuntu 22.04 not used #3286, but I think these are two different problems, because in my case ROCm works just fine.

Version Platform Description

Starting SD.Next
Logger: file="./SD.Next/sdnext.log" level=INFO size=14486 mode=append
Python version=3.11.9 platform=Linux bin="~/SD.Next/venv/bin/python3" venv="./SD.Next/venv"
Version: app=sd.next updated=2024-06-24 hash=94f6f0db branch=master url=https://github.com/vladmandic/automatic/tree/master ui=main
Platform: arch=x86_64 cpu= system=Linux release=6.9.6-200.fc40.x86_64 python=3.11.9
HF cache folder: ./.cache/huggingface/hub
Python version=3.11.9 platform=Linux bin="./SD.Next/venv/bin/python3" venv="./SD.Next/venv"
AMD ROCm toolkit detected
Verifying requirements
Verifying packages
Extensions: disabled=[]
Extensions: enabled=['Lora', 'sd-extension-chainner', 'sd-extension-system-info', 'sd-webui-agent-scheduler', 'sdnext-modernui', 'stable-diffusion-webui-rembg']
extensions-builtin
Extensions: enabled=['sd-dynamic-prompts'] extensions
Startup: standard
Verifying submodules
Extensions enabled: ['Lora', 'sd-extension-chainner', 'sd-extension-system-info', 'sd-webui-agent-scheduler', 'sdnext-modernui', 'stable-diffusion-webui-rembg',
'sd-dynamic-prompts']

Relevant log output

16:57:22-543088 INFO     Select: model="SD15/__BASE__/Leosam/leosamsHelloworldXL_filmGrain20 [fe54b5d04d]"                                                                                        
16:57:22-544852 DEBUG    Load model: existing=False target=~/SD.Next/models/Stable-diffusion/SD15/__BASE__/Leosam/leosamsHelloworldXL_filmGrain20.safetensors info=None                 
16:57:22-546571 DEBUG    Diffusers loading: path="~/SD.Next/models/Stable-diffusion/SD15/__BASE__/Leosam/leosamsHelloworldXL_filmGrain20.safetensors"                                   
16:57:22-547759 INFO     Autodetect: model="Stable Diffusion XL" class=StableDiffusionXLPipeline                                                                                                  
                         file="~/SD.Next/models/Stable-diffusion/SD15/__BASE__/Leosam/leosamsHelloworldXL_filmGrain20.safetensors" size=4924MB                                          
Loading pipeline components...  29% ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╺━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 2/7  [ 0:00:00 < -:--:-- , ? C/s ]
16:57:22-716222 ERROR    Diffusers failed loading: model=~/SD.Next/models/Stable-diffusion/SD15/__BASE__/Leosam/leosamsHelloworldXL_filmGrain20.safetensors pipeline=Autodetect/NoneType
                         config={'low_cpu_mem_usage': True, 'torch_dtype': torch.float16, 'load_connected_pipeline': True, 'safety_checker': None, 'requires_safety_checker': False,              
                         'local_files_only': False, 'extract_ema': False, 'config': 'configs/sdxl', 'use_safetensors': True, 'cache_dir': '~/.cache/huggingface/hub'} Cannot load       
                         because text_model.embeddings.position_embedding.weight expected shape tensor(..., device='meta', size=(77, 1280)), but got torch.Size([77, 768]). If you want to instead
                         overwrite randomly initialized weights, please make sure to pass both `low_cpu_mem_usage=False` and `ignore_mismatched_sizes=True`. For more information, see also:      
                         https://github.com/huggingface/diffusers/issues/1619#issuecomment-1345604389 as an example.                                                                              
16:57:22-718950 ERROR    loading model=~/SD.Next/models/Stable-diffusion/SD15/__BASE__/Leosam/leosamsHelloworldXL_filmGrain20.safetensors pipeline=Autodetect/NoneType: ValueError      
╭────────────────────────────────────────────────────────────────────────────── Traceback (most recent call last) ───────────────────────────────────────────────────────────────────────────────╮
│ ~/SD.Next/modules/sd_models.py:1085 in load_diffuser                                                                                                                                 │
│                                                                                                                                                                                                │
│   1084 │   │   │   │   │   │   sd_hijack_accelerate.restore_accelerate()                                                                                                                       │
│ ❱ 1085 │   │   │   │   │   sd_model = pipeline.from_single_file(checkpoint_info.path, **diffusers_load_config)                                                                                 │
│   1086 │   │   │   │   │   # sd_model = patch_diffuser_config(sd_model, checkpoint_info.path)                                                                                                  │
│                                                                                                                                                                                                │
│ ~/SD.Next/venv/lib64/python3.11/site-packages/huggingface_hub/utils/_validators.py:114 in _inner_fn                                                                                  │
│                                                                                                                                                                                                │
│   113 │   │                                                                                                                                                                                    │
│ ❱ 114 │   │   return fn(*args, **kwargs)                                                                                                                                                       │
│   115                                                                                                                                                                                          │
│                                                                                                                                                                                                │
│ ~/SD.Next/venv/lib64/python3.11/site-packages/diffusers/loaders/single_file.py:503 in from_single_file                                                                               │
│                                                                                                                                                                                                │
│   502 │   │   │   │   try:                                                                                                                                                                     │
│ ❱ 503 │   │   │   │   │   loaded_sub_model = load_single_file_sub_model(                                                                                                                       │
│   504 │   │   │   │   │   │   library_name=library_name,                                                                                                                                       │
│                                                                                                                                                                                                │
│ ~/SD.Next/venv/lib64/python3.11/site-packages/diffusers/loaders/single_file.py:113 in load_single_file_sub_model                                                                     │
│                                                                                                                                                                                                │
│   112 │   elif is_transformers_model and is_clip_model_in_single_file(class_obj, checkpoint):                                                                                                  │
│ ❱ 113 │   │   loaded_sub_model = create_diffusers_clip_model_from_ldm(                                                                                                                         │
│   114 │   │   │   class_obj,                                                                                                                                                                   │
│                                                                                                                                                                                                │
│ ~/SD.Next/venv/lib64/python3.11/site-packages/diffusers/loaders/single_file_utils.py:1411 in create_diffusers_clip_model_from_ldm                                                    │
│                                                                                                                                                                                                │
│   1410 │   if is_accelerate_available():                                                                                                                                                       │
│ ❱ 1411 │   │   unexpected_keys = load_model_dict_into_meta(model, diffusers_format_checkpoint, dtype=torch_dtype)                                                                              │
│   1412 │   else:                                                                                                                                                                               │
│                                                                                                                                                                                                │
│ ~/SD.Next/venv/lib64/python3.11/site-packages/diffusers/models/model_loading_utils.py:154 in load_model_dict_into_meta                                                               │
│                                                                                                                                                                                                │
│   153 │   │   │   model_name_or_path_str = f"{model_name_or_path} " if model_name_or_path is not None else ""                                                                                  │
│ ❱ 154 │   │   │   raise ValueError(                                                                                                                                                            │
│   155 │   │   │   │   f"Cannot load {model_name_or_path_str}because {param_name} expected shape {empty_state_dict[param_name]}, but got {param.shape}. If you want to instead overwrite random │
╰────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
ValueError: Cannot load because text_model.embeddings.position_embedding.weight expected shape tensor(..., device='meta', size=(77, 1280)), but got torch.Size([77, 768]). If you want to instead overwrite randomly initialized weights, please make sure to pass both `low_cpu_mem_usage=False` and `ignore_mismatched_sizes=True`. For more information, see also: https://github.com/huggingface/diffusers/issues/1619#issuecomment-1345604389 as an example.

Backend

Diffusers

Branch

Master

Model

SD 1.5

Acknowledgements

[X] I have read the above and searched for existing issues
[X] I confirm that this is classified correctly and its not an extension issue

vladmandic commented 4 months ago

Aardvark 2024 Photography is 6.52GB SD1.5 model - i have no idea what is all the junk in 6.5GB as standard SD15 models are 2.1GB pruned and 4.2GB unpruned.
LEOSAM's HelloWorld XL is 4.8GB SDXL model - below what is considered viable for normal SDXL model which is 6.5GB - which means there are part of the model missing. also, its mis-labeled, its marked on civitai as SD15 model while entire description talks about how this is SDXL model. and he also states that clip-skip:2 is required while this is SDXL model - which is 100% incorrect.

in both cases, they are NOT standard or expected models and autodetect is not expected to detect them. that is exactly why you have an option to set pipeline type manually instead of autodetect - and then maybe it works, maybe it doesn't.

if authors in either model actually provided information and reasons for those weird sizes and what's inside the model and what not, I may be able to automatically handle it, but i'm not about to try to reverse engineer non-standard and non-documented models.

Muxropendiy commented 4 months ago

Thanks for the comment, and I apologize for the inconvenience. I will try to find out from the authors how they achieved such a result. Maybe I can find out something useful. If not - models full of garbage will go to the trash.

vladmandic commented 4 months ago

there are soo many good models out there that follow all normal priciples of finetuning and are well labelled.

i'm totally fine to new model architectures as research work (that is how progress is made), but then it should be marked as such and clearly noted how was model created.

for example, in Aardvark case author states he used original sd15 checkpoint as starting point, but then what happened to it? it didn't grow 3x automagically on its own during finetuning. and leosam is even worse - it its mislabeled and no explanation how it was created other than "sdxl with dpo" - but that alone does not result in model which is 33% smaller (and still mislabeled)

vladmandic / automatic