lllyasviel / stable-diffusion-webui-forge

GNU Affero General Public License v3.0
8.64k stars 853 forks source link

"'NoneType' object has no attribute 'sd_checkpoint_info'", if prior was "You do not have CLIP state dict!" #2166

Open dermesut opened 1 month ago

dermesut commented 1 month ago

once i get "you don't have state dict", i can't generate an image with the sd model that is set, even if i complete the state dict, due to "'NoneType' object has no attribute 'sd_checkpoint_info'" only after i change the sd model to something else, generate an image with that one, and then change back to my original sd model, i can generate an image again.

this is the process, that lets me reproduce that issue:

1) start of server

Python 3.10.6 (tags/v3.10.6:9c7b4bd, Aug  1 2022, 21:53:49) [MSC v.1932 64 bit (AMD64)]
Version: f2.0.1v1.10.1-previous-584-g9a698e26
Commit hash: 9a698e26d6744de24d05568c9938a52694dbb3f0
Faceswaplab : Use GPU requirements
Checking faceswaplab requirements
Install protobuf>=3.20.2
Installing sd-webui-faceswaplab requirement: protobuf>=3.20.2
1.1965869999985443
CUDA 12.1
Launching Web UI with arguments: --api --port 7861 --gpu-device-id 1 --wildcards-dir 'E:\ai\_gh_repos\sd.webui\webui\extensions\stable-diffusion-webui-wildcards\wildcards' --forge-ref-a1111-home 'E:\ai\_gh_repos\sd.webui\webui' --text-encoder-dir 'E:\ai\_gh_repos\sd.webui\webui\models\text_encoder' --xformers --cuda-malloc --cuda-stream --pin-shared-memory --ckpt-dir 'E:\ai\_gh_repos\sd.webui\webui\models\Stable-diffusion' --vae-dir 'E:\ai\_gh_repos\sd.webui\webui\models\VAE' --hypernetwork-dir 'E:\ai\_gh_repos\sd.webui\webui\models\hypernetworks' --embeddings-dir 'E:\ai\_gh_repos\sd.webui\webui\embeddings' --lora-dir 'E:\ai\_gh_repos\sd.webui\webui\models\lora' --controlnet-dir 'E:\ai\_gh_repos\sd.webui\webui\models\ControlNet' --controlnet-preprocessor-models-dir 'E:\ai\_gh_repos\sd.webui\webui\extensions\sd-webui-controlnet\annotator\downloads'
Set device to: 1
Using cudaMallocAsync backend.
Total VRAM 11264 MB, total RAM 130956 MB
pytorch version: 2.3.1+cu121
xformers version: 0.0.27
Set vram state to: NORMAL_VRAM
Always pin shared GPU memory
Device: cuda:0 NVIDIA GeForce RTX 2080 Ti : cudaMallocAsync
VAE dtype preferences: [torch.float32] -> torch.float32
CUDA Using Stream: True
E:\ai\_gh_repos\webui_forge_cu121_torch231_EXP\system\python\lib\site-packages\transformers\utils\hub.py:127: FutureWarning: Using `TRANSFORMERS_CACHE` is deprecated and will be removed in v5 of Transformers. Use `HF_HOME` instead.
  warnings.warn(
Using xformers cross attention
Using xformers attention for VAE
ControlNet preprocessor location: E:\ai\_gh_repos\sd.webui\webui\extensions\sd-webui-controlnet\annotator\downloads
14:29:37 - ReActor - STATUS - Running v0.7.1-b2 on Device: CUDA
Loading additional modules ... done.
2024-10-24 14:29:44,212 - ControlNet - INFO - ControlNet UI callback registered.
Model selected: {'checkpoint_info': {'filename': 'E:\\ai\\_gh_repos\\sd.webui\\webui\\models\\Stable-diffusion\\FLUX\\flux1-dev-bnb-nf4-v2.safetensors', 'hash': 'f0770152'}, 'additional_modules': [], 'unet_storage_dtype': None}
Using online LoRAs in FP16: False
Running on local URL:  http://127.0.0.1:7861

To create a public link, set `share=True` in `launch()`.
Startup time: 31.0s (prepare environment: 6.4s, import torch: 7.4s, initialize shared: 0.1s, other imports: 0.4s, load scripts: 2.9s, initialize extra networks: 0.2s, initialize google blockly: 4.4s, create ui: 3.7s, gradio launch: 2.3s, add APIs: 3.1s).
Environment vars changed: {'stream': False, 'inference_memory': 1024.0, 'pin_shared_memory': False}
[GPU Setting] You will use 90.91% GPU memory (10239.00 MB) to load weights, and use 9.09% GPU memory (1024.00 MB) to do matrix computation.

2) add vae/encoders one by one:

Model selected: {'checkpoint_info': {'filename': 'E:\\ai\\_gh_repos\\sd.webui\\webui\\models\\Stable-diffusion\\FLUX\\flux1-dev-bnb-nf4-v2.safetensors', 'hash': 'f0770152'}, 'additional_modules': ['E:\\ai\\_gh_repos\\sd.webui\\webui\\models\\VAE\\ae.safetensors'], 'unet_storage_dtype': None}
Using online LoRAs in FP16: False
Model selected: {'checkpoint_info': {'filename': 'E:\\ai\\_gh_repos\\sd.webui\\webui\\models\\Stable-diffusion\\FLUX\\flux1-dev-bnb-nf4-v2.safetensors', 'hash': 'f0770152'}, 'additional_modules': ['E:\\ai\\_gh_repos\\sd.webui\\webui\\models\\VAE\\ae.safetensors', 'E:\\ai\\_gh_repos\\sd.webui\\webui\\models\\text_encoder\\clip_l.safetensors'], 'unet_storage_dtype': None}
Using online LoRAs in FP16: False
Model selected: {'checkpoint_info': {'filename': 'E:\\ai\\_gh_repos\\sd.webui\\webui\\models\\Stable-diffusion\\FLUX\\flux1-dev-bnb-nf4-v2.safetensors', 'hash': 'f0770152'}, 'additional_modules': ['E:\\ai\\_gh_repos\\sd.webui\\webui\\models\\VAE\\ae.safetensors', 'E:\\ai\\_gh_repos\\sd.webui\\webui\\models\\text_encoder\\clip_l.safetensors', 'E:\\ai\\_gh_repos\\sd.webui\\webui\\models\\text_encoder\\t5xxl_fp8_e4m3fn.safetensors'], 'unet_storage_dtype': None}
Using online LoRAs in FP16: False

3) change flux-model:

Model selected: {'checkpoint_info': {'filename': 'E:\\ai\\_gh_repos\\sd.webui\\webui\\models\\Stable-diffusion\\FLUX\\VerusVision_1.0b_Transformer.safetensors', 'hash': '8cb933ca'}, 'additional_modules': ['E:\\ai\\_gh_repos\\sd.webui\\webui\\models\\VAE\\ae.safetensors', 'E:\\ai\\_gh_repos\\sd.webui\\webui\\models\\text_encoder\\clip_l.safetensors', 'E:\\ai\\_gh_repos\\sd.webui\\webui\\models\\text_encoder\\t5xxl_fp8_e4m3fn.safetensors'], 'unet_storage_dtype': None}
Using online LoRAs in FP16: False

4) click "generate":

Loading Model: {'checkpoint_info': {'filename': 'E:\\ai\\_gh_repos\\sd.webui\\webui\\models\\Stable-diffusion\\FLUX\\VerusVision_1.0b_Transformer.safetensors', 'hash': '8cb933ca'}, 'additional_modules': ['E:\\ai\\_gh_repos\\sd.webui\\webui\\models\\VAE\\ae.safetensors', 'E:\\ai\\_gh_repos\\sd.webui\\webui\\models\\text_encoder\\clip_l.safetensors', 'E:\\ai\\_gh_repos\\sd.webui\\webui\\models\\text_encoder\\t5xxl_fp8_e4m3fn.safetensors'], 'unet_storage_dtype': None}
[Unload] Trying to free all memory for cuda:0 with 0 models keep loaded ... Done.
StateDict Keys: {'transformer': 776, 'vae': 244, 'text_encoder': 196, 'text_encoder_2': 220, 'ignore': 0}
Using Detected T5 Data Type: torch.float8_e4m3fn
Using Detected UNet Type: torch.float8_e4m3fn
Working with z of shape (1, 16, 32, 32) = 16384 dimensions.
K-Model Created: {'storage_dtype': torch.float8_e4m3fn, 'computation_dtype': torch.float16}
Model loaded in 1.2s (unload existing model: 0.4s, forge model load: 0.7s).
Warning: field infotext in API payload not found in <modules.processing.StableDiffusionProcessingTxt2Img object at 0x0000028E263FEDD0>.
[Unload] Trying to free 7723.54 MB for cuda:0 with 0 models keep loaded ... Done.
[Memory Management] Target: JointTextEncoder, Free GPU: 10085.31 MB, Model Require: 5153.49 MB, Previously Loaded: 0.00 MB, Inference Require: 1024.00 MB, Remaining: 3907.81 MB, All loaded to GPU.
Moving model(s) has taken 2.43 seconds
Distilled CFG Scale will be ignored for Schnell
[Unload] Trying to free 1024.00 MB for cuda:0 with 1 models keep loaded ... Current free memory is 4808.97 MB ... Done.
Distilled CFG Scale will be ignored for Schnell
[Unload] Trying to free 16032.65 MB for cuda:0 with 0 models keep loaded ... Current free memory is 4803.98 MB ... Unload model JointTextEncoder Done.
[Memory Management] Target: KModel, Free GPU: 10029.96 MB, Model Require: 11340.31 MB, Previously Loaded: 0.00 MB, Inference Require: 1024.00 MB, Remaining: -2334.35 MB, CPU Swap Loaded (blocked method): 3636.00 MB, GPU Loaded: 7704.31 MB
Moving model(s) has taken 5.14 seconds
100%|#######################################################################################################################################################################################################################################################################################| 20/20 [01:28<00:00,  4.41s/it]
[Unload] Trying to free 8991.55 MB for cuda:0 with 0 models keep loaded ... Current free memory is 2305.54 MB ... Unload model KModel Done.█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 20/20 [01:22<00:00,  4.48s/it]
[Memory Management] Target: IntegratedAutoencoderKL, Free GPU: 10009.87 MB, Model Require: 319.75 MB, Previously Loaded: 0.00 MB, Inference Require: 1024.00 MB, Remaining: 8666.12 MB, All loaded to GPU.
Moving model(s) has taken 2.79 seconds
Total progress: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 20/20 [01:27<00:00,  4.36s/it]

5) delete vae/encoders one by one in order to force "you don't have state dict":

Model selected: {'checkpoint_info': {'filename': 'E:\\ai\\_gh_repos\\sd.webui\\webui\\models\\Stable-diffusion\\FLUX\\VerusVision_1.0b_Transformer.safetensors', 'hash': '8cb933ca'}, 'additional_modules': ['E:\\ai\\_gh_repos\\sd.webui\\webui\\models\\text_encoder\\clip_l.safetensors', 'E:\\ai\\_gh_repos\\sd.webui\\webui\\models\\text_encoder\\t5xxl_fp8_e4m3fn.safetensors'], 'unet_storage_dtype': None}
Using online LoRAs in FP16: False
Model selected: {'checkpoint_info': {'filename': 'E:\\ai\\_gh_repos\\sd.webui\\webui\\models\\Stable-diffusion\\FLUX\\VerusVision_1.0b_Transformer.safetensors', 'hash': '8cb933ca'}, 'additional_modules': ['E:\\ai\\_gh_repos\\sd.webui\\webui\\models\\text_encoder\\t5xxl_fp8_e4m3fn.safetensors'], 'unet_storage_dtype': None}
Using online LoRAs in FP16: False
Model selected: {'checkpoint_info': {'filename': 'E:\\ai\\_gh_repos\\sd.webui\\webui\\models\\Stable-diffusion\\FLUX\\VerusVision_1.0b_Transformer.safetensors', 'hash': '8cb933ca'}, 'additional_modules': [], 'unet_storage_dtype': None}
Using online LoRAs in FP16: False

6) click "generate":

Loading Model: {'checkpoint_info': {'filename': 'E:\\ai\\_gh_repos\\sd.webui\\webui\\models\\Stable-diffusion\\FLUX\\VerusVision_1.0b_Transformer.safetensors', 'hash': '8cb933ca'}, 'additional_modules': [], 'unet_storage_dtype': None}
[Unload] Trying to free all memory for cuda:0 with 0 models keep loaded ... Current free memory is 9681.11 MB ... Unload model IntegratedAutoencoderKL Done.
StateDict Keys: {'transformer': 776, 'vae': 0, 'ignore': 0}
Traceback (most recent call last):
  File "E:\ai\_gh_repos\webui_forge_cu121_torch231_EXP\webui\modules_forge\main_thread.py", line 30, in work
    self.result = self.func(*self.args, **self.kwargs)
  File "E:\ai\_gh_repos\webui_forge_cu121_torch231_EXP\webui\modules\txt2img.py", line 125, in txt2img_function
    processed = processing.process_images(p)
  File "E:\ai\_gh_repos\webui_forge_cu121_torch231_EXP\webui\modules\processing.py", line 834, in process_images
    manage_model_and_prompt_cache(p)
  File "E:\ai\_gh_repos\webui_forge_cu121_torch231_EXP\webui\modules\processing.py", line 802, in manage_model_and_prompt_cache
    p.sd_model, just_reloaded = forge_model_reload()
  File "E:\ai\_gh_repos\webui_forge_cu121_torch231_EXP\system\python\lib\site-packages\torch\utils\_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "E:\ai\_gh_repos\webui_forge_cu121_torch231_EXP\webui\modules\sd_models.py", line 504, in forge_model_reload
    sd_model = forge_loader(state_dict, additional_state_dicts=additional_state_dicts)
  File "E:\ai\_gh_repos\webui_forge_cu121_torch231_EXP\system\python\lib\site-packages\torch\utils\_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "E:\ai\_gh_repos\webui_forge_cu121_torch231_EXP\webui\backend\loader.py", line 285, in forge_loader
    component = load_huggingface_component(estimated_config, component_name, lib_name, cls_name, local_path, component_sd)
  File "E:\ai\_gh_repos\webui_forge_cu121_torch231_EXP\webui\backend\loader.py", line 59, in load_huggingface_component
    assert isinstance(state_dict, dict) and len(state_dict) > 16, 'You do not have CLIP state dict!'
AssertionError: You do not have CLIP state dict!
You do not have CLIP state dict!

7) add back vae/encoders one by one:

Model selected: {'checkpoint_info': {'filename': 'E:\\ai\\_gh_repos\\sd.webui\\webui\\models\\Stable-diffusion\\FLUX\\VerusVision_1.0b_Transformer.safetensors', 'hash': '8cb933ca'}, 'additional_modules': ['E:\\ai\\_gh_repos\\sd.webui\\webui\\models\\VAE\\ae.safetensors'], 'unet_storage_dtype': None}
Using online LoRAs in FP16: False
Model selected: {'checkpoint_info': {'filename': 'E:\\ai\\_gh_repos\\sd.webui\\webui\\models\\Stable-diffusion\\FLUX\\VerusVision_1.0b_Transformer.safetensors', 'hash': '8cb933ca'}, 'additional_modules': ['E:\\ai\\_gh_repos\\sd.webui\\webui\\models\\VAE\\ae.safetensors', 'E:\\ai\\_gh_repos\\sd.webui\\webui\\models\\text_encoder\\clip_l.safetensors'], 'unet_storage_dtype': None}
Using online LoRAs in FP16: False
Model selected: {'checkpoint_info': {'filename': 'E:\\ai\\_gh_repos\\sd.webui\\webui\\models\\Stable-diffusion\\FLUX\\VerusVision_1.0b_Transformer.safetensors', 'hash': '8cb933ca'}, 'additional_modules': ['E:\\ai\\_gh_repos\\sd.webui\\webui\\models\\VAE\\ae.safetensors', 'E:\\ai\\_gh_repos\\sd.webui\\webui\\models\\text_encoder\\clip_l.safetensors', 'E:\\ai\\_gh_repos\\sd.webui\\webui\\models\\text_encoder\\t5xxl_fp8_e4m3fn.safetensors'], 'unet_storage_dtype': None}
Using online LoRAs in FP16: False

8) click "generate":

[Unload] Trying to free all memory for cuda:0 with 0 models keep loaded ... Done.
Traceback (most recent call last):
  File "E:\ai\_gh_repos\webui_forge_cu121_torch231_EXP\webui\modules_forge\main_thread.py", line 30, in work
    self.result = self.func(*self.args, **self.kwargs)
  File "E:\ai\_gh_repos\webui_forge_cu121_torch231_EXP\webui\modules\txt2img.py", line 125, in txt2img_function
    processed = processing.process_images(p)
  File "E:\ai\_gh_repos\webui_forge_cu121_torch231_EXP\webui\modules\processing.py", line 840, in process_images
    res = process_images_inner(p)
  File "E:\ai\_gh_repos\webui_forge_cu121_torch231_EXP\webui\modules\processing.py", line 877, in process_images_inner
    p.sd_model_name = shared.sd_model.sd_checkpoint_info.name_for_extra
AttributeError: 'NoneType' object has no attribute 'sd_checkpoint_info'
'NoneType' object has no attribute 'sd_checkpoint_info'
altoiddealer commented 1 month ago

Looking into this, it seems like:

altoiddealer commented 1 month ago

If I wrap forge_model_reload() (where this error occurs) with a try Except block like this, I expected that it would resolve this issue.

However, the issue persists despite unload_all_models() and clear_prompt_cache() being called.

def manage_model_and_prompt_cache(p: StableDiffusionProcessing):
    global need_global_unload

    try:
        p.sd_model, just_reloaded = forge_model_reload()
    except Exception as e:
        need_global_unload = True
        memory_management.unload_all_models()
        p.clear_prompt_cache()
        need_global_unload = False
        raise e

    if need_global_unload and not just_reloaded:
        memory_management.unload_all_models()

    if need_global_unload:
        p.clear_prompt_cache()

    need_global_unload = False

I think this would be the correct place to resolve the issue, just struggling to figure out what exactly needs to be fixed when this error occurs...

altoiddealer commented 1 month ago

If I simply put these print statements...

def manage_model_and_prompt_cache(p: StableDiffusionProcessing):
    global need_global_unload

    print("shared SD Model Pre Reload:", shared.sd_model)

    try:
        p.sd_model, just_reloaded = forge_model_reload()
    except Exception as e:
        print("shared SD Model After Error:", shared.sd_model)
        raise
    print("shared SD Model After Reload:", shared.sd_model)

Printed on successful generation:

shared SD Model Pre Reload: <modules.sd_models.FakeInitialModel object at 0x000002CD83F4A200>
...
shared SD Model After Reload: <backend.diffusion_engine.flux.Flux object at 0x000002CD5BDBFD30>

Printed on error:

shared SD Model Pre Reload: <modules.sd_models.FakeInitialModel object at 0x000002CD83F4A200>
...
shared SD Model After Error: None
altoiddealer commented 1 month ago

Well, I found a solution, but I don't think this is the best solution because it seems like the idea is to flush as much information down the toilet as possible before loading models.

Using sd_model_backup = model_data.sd_model

Setting the model back if forge_loader() fails.

def forge_model_reload():
    current_hash = str(model_data.forge_loading_parameters)

    if model_data.forge_hash == current_hash:
        return model_data.sd_model, False

    print('Loading Model: ' + str(model_data.forge_loading_parameters))

    timer = Timer()

    sd_model_backup = None

    if model_data.sd_model:
        sd_model_backup = model_data.sd_model
        model_data.sd_model = None
        memory_management.unload_all_models()
        memory_management.soft_empty_cache()
        gc.collect()

...

    try:
        sd_model = forge_loader(state_dict, additional_state_dicts=additional_state_dicts)
    except Exception as e:
        if sd_model_backup:
            model_data.set_sd_model(sd_model_backup)
        raise e
DenOfEquity commented 1 month ago

in manage_model_and_prompt_cache, dumping/reinitialising the 'real' model seems to work.

    try:
        p.sd_model, just_reloaded = forge_model_reload()
    except Exception as e:
        # reincarnate the model
        del sd_models.model_data
        sd_models.model_data = sd_models.SdModelData()
        raise

This is tested with a Schnell GGUF that gives a different error with missing modules (RuntimeError: Creating a Parameter from an instance of type ParameterGGUF ...), which then leads to the same 'NoneType' object has no attribute 'sd_checkpoint_info'.

altoiddealer commented 1 month ago

@DenOfEquity Bravo! I tested this out, and this does resolve the issue.

You can omit the "as e" and the log will be the same

DenOfEquity commented 1 month ago

In exactly one of my tests the next model load was extremely slow with lots of disk activity. Probably just because I'm using an old laptop (8GB vRAM, 16GB RAM) and was continuing to use other applications at the same time. Did you see anything similar? Otherwise, I think this is mergeable.

altoiddealer commented 1 month ago

For me, it behaves just like a typical fresh start with nothing cached, which is a lot better than being stuck in a broken state. Not excessive load time.

However… maybe we could just make a super quick function that just looks at the model params and checks that everything it needs is there? Because it is basically trashing the current models before it simultaneously checks while loading models

I might have a chance to toy around with this idea tomorrow but I’ll be tied up all day today

Another edit - I imagine it’s set up like this because we can’t tell if the model has everything it needs baked in until it’s loaded? An idea to mitigate rebuilding from scratch may be to just append to a list/dict what the sd_model_checkpoint includes after it is loaded, and therefore on a subsequent load we have prior knowledge of what must be additionally included.

altoiddealer commented 1 month ago

@dermesut @DenOfEquity I pushed a PR for how I think is the correct way to handle this situation.

Before model data is trashed and it tries loading the new data, this does minimum steps to check the inbound model data and raise errors before dumping current model data