huggingface / diffusers

🤗 Diffusers: State-of-the-art diffusion models for image and audio generation in PyTorch and FLAX.
https://huggingface.co/docs/diffusers
Apache License 2.0
25.1k stars 5.18k forks source link

Loading PT Files for VAE with the AutoEncoder is still broken!!! #9154

Open JemiloII opened 1 month ago

JemiloII commented 1 month ago

Describe the bug

Just doesn't load pt files anymore. Really frustrating as it's been broken for a long time now. I keep posting about it, so now I'll just open a issue / bug instead of messaging in update threads. Last working version is 0.27.2

There is not a working safetensors or diffusers version of the VAE I'm using and I shouldn't have to. PT works just fine.

Reproduction

    pipe = StableDiffusionPipeline.from_single_file(
        "./assets/models/AOM3B4_orangemixs.safetensors",
        safety_checker=None,
        requires_safety_checker=False,
        cache_dir=path.join("./assets/models"),
        local_files_only=True,
        torch_dtype=torch.bfloat16,
    )

    pipe = pipe.to(device, torch.bfloat16)

    pipe.vae = AutoencoderKL.from_single_file(
        path.join("./assets/vae/orangemix.vae.pt"),
        local_files_only=True,
        torch_dtype=torch.bfloat16
    )

Logs

Traceback (most recent call last):
  File "C:\Users\Shibiko AI\Desktop\shibiko ai\diffusion-ai\.venv\lib\site-packages\diffusers\models\model_loading_utils.py", line 108, in load_state_dict
    return torch.load(
  File "C:\Users\Shibiko AI\Desktop\shibiko ai\diffusion-ai\.venv\lib\site-packages\torch\serialization.py", line 1024, in load
    raise pickle.UnpicklingError(UNSAFE_MESSAGE + str(e)) from None
_pickle.UnpicklingError: Weights only load failed. Re-running `torch.load` with `weights_only` set to `False` will likely succeed, but it can result in arbitrary code execution.Do it only if you get the file from a trusted source. WeightsUnpickler error: Unsupported class pytorch_lightning.callbacks.model_checkpoint.ModelCheckpoint

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "C:\Users\Shibiko AI\Desktop\shibiko ai\diffusion-ai\.venv\lib\site-packages\diffusers\models\model_loading_utils.py", line 116, in load_state_dict
    if f.read().startswith("version"):
  File "C:\Program Files\Python310\lib\encodings\cp1252.py", line 23, in decode
    return codecs.charmap_decode(input,self.errors,decoding_table)[0]
UnicodeDecodeError: 'charmap' codec can't decode byte 0x81 in position 1681: character maps to <undefined>

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "C:\Users\Shibiko AI\Desktop\shibiko ai\diffusion-test\main.py", line 89, in <module>
    asyncio.run(main(args.device, args.port))
  File "C:\Program Files\Python310\lib\asyncio\runners.py", line 44, in run
    return loop.run_until_complete(main)
  File "C:\Program Files\Python310\lib\asyncio\base_events.py", line 649, in run_until_complete
    return future.result()
  File "C:\Users\Shibiko AI\Desktop\shibiko ai\diffusion-test\main.py", line 79, in main
    pipe, clip_layers = shibiko_init(settings, device)
  File "C:\Users\Shibiko AI\Desktop\shibiko ai\diffusion-test\src\generation.py", line 126, in shibiko_init
    pipe.vae = AutoencoderKL.from_single_file(
  File "C:\Users\Shibiko AI\Desktop\shibiko ai\diffusion-ai\.venv\lib\site-packages\huggingface_hub\utils\_validators.py", line 114, in _inner_fn
    return fn(*args, **kwargs)
  File "C:\Users\Shibiko AI\Desktop\shibiko ai\diffusion-ai\.venv\lib\site-packages\diffusers\loaders\single_file_model.py", line 209, in from_single_file
    checkpoint = load_single_file_checkpoint(
  File "C:\Users\Shibiko AI\Desktop\shibiko ai\diffusion-ai\.venv\lib\site-packages\diffusers\loaders\single_file_utils.py", line 346, in load_single_file_checkpoint
    checkpoint = load_state_dict(pretrained_model_link_or_path)
  File "C:\Users\Shibiko AI\Desktop\shibiko ai\diffusion-ai\.venv\lib\site-packages\diffusers\models\model_loading_utils.py", line 128, in load_state_dict
    raise OSError(
OSError: Unable to load weights from checkpoint file for './assets/vae/orangemix.vae.pt' at './assets/vae/orangemix.vae.pt'.

System Info

Python 3.10.9

AMD 7950X3D | AMD 5950X RTX 4090 x2 | RTX 4090 128GB DDR5 | 128GB DDR4 Windows 10 | Windows 10

Who can help?

@sayakpaul

sayakpaul commented 1 month ago

Where does "/assets/vae/orangemix.vae.pt" come from?

Cc: @DN6 for single file.

JemiloII commented 1 month ago

Any VAE that's saved in a pt format

Link To VAE I'm Using https://huggingface.co/WarriorMama777/OrangeMixs/blob/main/VAEs/orangemix.vae.pt

I know there is a diffuser version of this, however that doesn't work. It was broken in v0.27.2 so I switched to safetensors. Plus, the diffusers version doesn't have VAE working anyways.

DN6 commented 3 weeks ago

Hi @JemiloII the issue here isn't the pt format. Rather that the checkpoint contains serialised objects that are not model weights. See attached screenshot below.

Screenshot 2024-08-19 at 11 09 18 AM

We switched to not allowing loading arbitrary serialised objects from pickle files after 0.27.2 since this is a potential security risk. Using torch.load with weights_only=False allows executing code on the users machine. See attached discussions: https://github.com/pytorch/pytorch/issues/52181 https://github.com/pytorch/pytorch/issues/52596 https://github.com/voicepaw/so-vits-svc-fork/issues/193

You can load the VAE state dict with weights_only=False in the following way

import torch
from huggingface_hub import hf_hub_download
from diffusers import AutoencoderKL

state_dict = torch.load(hf_hub_download("WarriorMama777/OrangeMixs", filename="VAEs/orangemix.vae.pt"), weights_only=False)
vae = AutoencoderKL.from_single_file(state_dict)
JemiloII commented 2 weeks ago

That doesn't work.

Traceback (most recent call last):
  File "C:\Users\Shibiko AI\AppData\Roaming\JetBrains\IntelliJIdea2024.2\plugins\python\helpers-pro\pydevd_asyncio\pydevd_nest_asyncio.py", line 138, in run
    return loop.run_until_complete(task)
  File "C:\Users\Shibiko AI\AppData\Roaming\JetBrains\IntelliJIdea2024.2\plugins\python\helpers-pro\pydevd_asyncio\pydevd_nest_asyncio.py", line 243, in run_until_complete
    return f.result()
  File "C:\Program Files\Python310\lib\asyncio\futures.py", line 201, in result
    raise self._exception.with_traceback(self._exception_tb)
  File "C:\Program Files\Python310\lib\asyncio\tasks.py", line 232, in __step
    result = coro.send(None)
  File "C:\Users\Shibiko AI\Desktop\shibiko ai\diffusion-ai\main.py", line 81, in main
    pipe, clip_layers = shibiko_init(settings, device)
  File "C:\Users\Shibiko AI\Desktop\shibiko ai\diffusion-ai\src\generation.py", line 144, in shibiko_init
    pipe.vae = AutoencoderKL.from_single_file(state_dict)
  File "C:\Users\Shibiko AI\Desktop\shibiko ai\diffusion-ai\.venv\lib\site-packages\huggingface_hub\utils\_validators.py", line 114, in _inner_fn
    return fn(*args, **kwargs)
  File "C:\Users\Shibiko AI\Desktop\shibiko ai\diffusion-ai\.venv\lib\site-packages\diffusers\loaders\autoencoder.py", line 119, in from_single_file
    original_config, checkpoint = fetch_ldm_config_and_checkpoint(
  File "C:\Users\Shibiko AI\Desktop\shibiko ai\diffusion-ai\.venv\lib\site-packages\diffusers\loaders\single_file_utils.py", line 314, in fetch_ldm_config_and_checkpoint
    checkpoint = load_single_file_model_checkpoint(
  File "C:\Users\Shibiko AI\Desktop\shibiko ai\diffusion-ai\.venv\lib\site-packages\diffusers\loaders\single_file_utils.py", line 339, in load_single_file_model_checkpoint
    if os.path.isfile(pretrained_model_link_or_path):
  File "C:\Program Files\Python310\lib\genericpath.py", line 30, in isfile
    st = os.stat(path)
TypeError: stat: path should be string, bytes, os.PathLike or integer, not dict
python-BaseException

Process finished with exit code 1

I even tried with pipe.vae.load_state_dict. No dice there. I even went to try that on 0.27.2...

JemiloII commented 2 weeks ago

Not sure if this is helpful, but the state_dict has these keys.

state_dict_keys.txt

encoder.conv_in.weight
encoder.conv_in.bias
encoder.down.0.block.0.norm1.weight
encoder.down.0.block.0.norm1.bias
encoder.down.0.block.0.conv1.weight
encoder.down.0.block.0.conv1.bias
encoder.down.0.block.0.norm2.weight
encoder.down.0.block.0.norm2.bias
encoder.down.0.block.0.conv2.weight
encoder.down.0.block.0.conv2.bias
encoder.down.0.block.1.norm1.weight
encoder.down.0.block.1.norm1.bias
encoder.down.0.block.1.conv1.weight
encoder.down.0.block.1.conv1.bias
encoder.down.0.block.1.norm2.weight
encoder.down.0.block.1.norm2.bias
encoder.down.0.block.1.conv2.weight
encoder.down.0.block.1.conv2.bias
encoder.down.0.downsample.conv.weight
encoder.down.0.downsample.conv.bias
encoder.down.1.block.0.norm1.weight
encoder.down.1.block.0.norm1.bias
encoder.down.1.block.0.conv1.weight
encoder.down.1.block.0.conv1.bias
encoder.down.1.block.0.norm2.weight
encoder.down.1.block.0.norm2.bias
encoder.down.1.block.0.conv2.weight
encoder.down.1.block.0.conv2.bias
encoder.down.1.block.0.nin_shortcut.weight
encoder.down.1.block.0.nin_shortcut.bias
encoder.down.1.block.1.norm1.weight
encoder.down.1.block.1.norm1.bias
encoder.down.1.block.1.conv1.weight
encoder.down.1.block.1.conv1.bias
encoder.down.1.block.1.norm2.weight
encoder.down.1.block.1.norm2.bias
encoder.down.1.block.1.conv2.weight
encoder.down.1.block.1.conv2.bias
encoder.down.1.downsample.conv.weight
encoder.down.1.downsample.conv.bias
encoder.down.2.block.0.norm1.weight
encoder.down.2.block.0.norm1.bias
encoder.down.2.block.0.conv1.weight
encoder.down.2.block.0.conv1.bias
encoder.down.2.block.0.norm2.weight
encoder.down.2.block.0.norm2.bias
encoder.down.2.block.0.conv2.weight
encoder.down.2.block.0.conv2.bias
encoder.down.2.block.0.nin_shortcut.weight
encoder.down.2.block.0.nin_shortcut.bias
encoder.down.2.block.1.norm1.weight
encoder.down.2.block.1.norm1.bias
encoder.down.2.block.1.conv1.weight
encoder.down.2.block.1.conv1.bias
encoder.down.2.block.1.norm2.weight
encoder.down.2.block.1.norm2.bias
encoder.down.2.block.1.conv2.weight
encoder.down.2.block.1.conv2.bias
encoder.down.2.downsample.conv.weight
encoder.down.2.downsample.conv.bias
encoder.down.3.block.0.norm1.weight
encoder.down.3.block.0.norm1.bias
encoder.down.3.block.0.conv1.weight
encoder.down.3.block.0.conv1.bias
encoder.down.3.block.0.norm2.weight
encoder.down.3.block.0.norm2.bias
encoder.down.3.block.0.conv2.weight
encoder.down.3.block.0.conv2.bias
encoder.down.3.block.1.norm1.weight
encoder.down.3.block.1.norm1.bias
encoder.down.3.block.1.conv1.weight
encoder.down.3.block.1.conv1.bias
encoder.down.3.block.1.norm2.weight
encoder.down.3.block.1.norm2.bias
encoder.down.3.block.1.conv2.weight
encoder.down.3.block.1.conv2.bias
encoder.mid.block_1.norm1.weight
encoder.mid.block_1.norm1.bias
encoder.mid.block_1.conv1.weight
encoder.mid.block_1.conv1.bias
encoder.mid.block_1.norm2.weight
encoder.mid.block_1.norm2.bias
encoder.mid.block_1.conv2.weight
encoder.mid.block_1.conv2.bias
encoder.mid.attn_1.norm.weight
encoder.mid.attn_1.norm.bias
encoder.mid.attn_1.q.weight
encoder.mid.attn_1.q.bias
encoder.mid.attn_1.k.weight
encoder.mid.attn_1.k.bias
encoder.mid.attn_1.v.weight
encoder.mid.attn_1.v.bias
encoder.mid.attn_1.proj_out.weight
encoder.mid.attn_1.proj_out.bias
encoder.mid.block_2.norm1.weight
encoder.mid.block_2.norm1.bias
encoder.mid.block_2.conv1.weight
encoder.mid.block_2.conv1.bias
encoder.mid.block_2.norm2.weight
encoder.mid.block_2.norm2.bias
encoder.mid.block_2.conv2.weight
encoder.mid.block_2.conv2.bias
encoder.norm_out.weight
encoder.norm_out.bias
encoder.conv_out.weight
encoder.conv_out.bias
decoder.conv_in.weight
decoder.conv_in.bias
decoder.mid.block_1.norm1.weight
decoder.mid.block_1.norm1.bias
decoder.mid.block_1.conv1.weight
decoder.mid.block_1.conv1.bias
decoder.mid.block_1.norm2.weight
decoder.mid.block_1.norm2.bias
decoder.mid.block_1.conv2.weight
decoder.mid.block_1.conv2.bias
decoder.mid.attn_1.norm.weight
decoder.mid.attn_1.norm.bias
decoder.mid.attn_1.q.weight
decoder.mid.attn_1.q.bias
decoder.mid.attn_1.k.weight
decoder.mid.attn_1.k.bias
decoder.mid.attn_1.v.weight
decoder.mid.attn_1.v.bias
decoder.mid.attn_1.proj_out.weight
decoder.mid.attn_1.proj_out.bias
decoder.mid.block_2.norm1.weight
decoder.mid.block_2.norm1.bias
decoder.mid.block_2.conv1.weight
decoder.mid.block_2.conv1.bias
decoder.mid.block_2.norm2.weight
decoder.mid.block_2.norm2.bias
decoder.mid.block_2.conv2.weight
decoder.mid.block_2.conv2.bias
decoder.up.0.block.0.norm1.weight
decoder.up.0.block.0.norm1.bias
decoder.up.0.block.0.conv1.weight
decoder.up.0.block.0.conv1.bias
decoder.up.0.block.0.norm2.weight
decoder.up.0.block.0.norm2.bias
decoder.up.0.block.0.conv2.weight
decoder.up.0.block.0.conv2.bias
decoder.up.0.block.0.nin_shortcut.weight
decoder.up.0.block.0.nin_shortcut.bias
decoder.up.0.block.1.norm1.weight
decoder.up.0.block.1.norm1.bias
decoder.up.0.block.1.conv1.weight
decoder.up.0.block.1.conv1.bias
decoder.up.0.block.1.norm2.weight
decoder.up.0.block.1.norm2.bias
decoder.up.0.block.1.conv2.weight
decoder.up.0.block.1.conv2.bias
decoder.up.0.block.2.norm1.weight
decoder.up.0.block.2.norm1.bias
decoder.up.0.block.2.conv1.weight
decoder.up.0.block.2.conv1.bias
decoder.up.0.block.2.norm2.weight
decoder.up.0.block.2.norm2.bias
decoder.up.0.block.2.conv2.weight
decoder.up.0.block.2.conv2.bias
decoder.up.1.block.0.norm1.weight
decoder.up.1.block.0.norm1.bias
decoder.up.1.block.0.conv1.weight
decoder.up.1.block.0.conv1.bias
decoder.up.1.block.0.norm2.weight
decoder.up.1.block.0.norm2.bias
decoder.up.1.block.0.conv2.weight
decoder.up.1.block.0.conv2.bias
decoder.up.1.block.0.nin_shortcut.weight
decoder.up.1.block.0.nin_shortcut.bias
decoder.up.1.block.1.norm1.weight
decoder.up.1.block.1.norm1.bias
decoder.up.1.block.1.conv1.weight
decoder.up.1.block.1.conv1.bias
decoder.up.1.block.1.norm2.weight
decoder.up.1.block.1.norm2.bias
decoder.up.1.block.1.conv2.weight
decoder.up.1.block.1.conv2.bias
decoder.up.1.block.2.norm1.weight
decoder.up.1.block.2.norm1.bias
decoder.up.1.block.2.conv1.weight
decoder.up.1.block.2.conv1.bias
decoder.up.1.block.2.norm2.weight
decoder.up.1.block.2.norm2.bias
decoder.up.1.block.2.conv2.weight
decoder.up.1.block.2.conv2.bias
decoder.up.1.upsample.conv.weight
decoder.up.1.upsample.conv.bias
decoder.up.2.block.0.norm1.weight
decoder.up.2.block.0.norm1.bias
decoder.up.2.block.0.conv1.weight
decoder.up.2.block.0.conv1.bias
decoder.up.2.block.0.norm2.weight
decoder.up.2.block.0.norm2.bias
decoder.up.2.block.0.conv2.weight
decoder.up.2.block.0.conv2.bias
decoder.up.2.block.1.norm1.weight
decoder.up.2.block.1.norm1.bias
decoder.up.2.block.1.conv1.weight
decoder.up.2.block.1.conv1.bias
decoder.up.2.block.1.norm2.weight
decoder.up.2.block.1.norm2.bias
decoder.up.2.block.1.conv2.weight
decoder.up.2.block.1.conv2.bias
decoder.up.2.block.2.norm1.weight
decoder.up.2.block.2.norm1.bias
decoder.up.2.block.2.conv1.weight
decoder.up.2.block.2.conv1.bias
decoder.up.2.block.2.norm2.weight
decoder.up.2.block.2.norm2.bias
decoder.up.2.block.2.conv2.weight
decoder.up.2.block.2.conv2.bias
decoder.up.2.upsample.conv.weight
decoder.up.2.upsample.conv.bias
decoder.up.3.block.0.norm1.weight
decoder.up.3.block.0.norm1.bias
decoder.up.3.block.0.conv1.weight
decoder.up.3.block.0.conv1.bias
decoder.up.3.block.0.norm2.weight
decoder.up.3.block.0.norm2.bias
decoder.up.3.block.0.conv2.weight
decoder.up.3.block.0.conv2.bias
decoder.up.3.block.1.norm1.weight
decoder.up.3.block.1.norm1.bias
decoder.up.3.block.1.conv1.weight
decoder.up.3.block.1.conv1.bias
decoder.up.3.block.1.norm2.weight
decoder.up.3.block.1.norm2.bias
decoder.up.3.block.1.conv2.weight
decoder.up.3.block.1.conv2.bias
decoder.up.3.block.2.norm1.weight
decoder.up.3.block.2.norm1.bias
decoder.up.3.block.2.conv1.weight
decoder.up.3.block.2.conv1.bias
decoder.up.3.block.2.norm2.weight
decoder.up.3.block.2.norm2.bias
decoder.up.3.block.2.conv2.weight
decoder.up.3.block.2.conv2.bias
decoder.up.3.upsample.conv.weight
decoder.up.3.upsample.conv.bias
decoder.norm_out.weight
decoder.norm_out.bias
decoder.conv_out.weight
decoder.conv_out.bias
loss.logvar
loss.perceptual_loss.scaling_layer.shift
loss.perceptual_loss.scaling_layer.scale
loss.perceptual_loss.net.slice1.0.weight
loss.perceptual_loss.net.slice1.0.bias
loss.perceptual_loss.net.slice1.2.weight
loss.perceptual_loss.net.slice1.2.bias
loss.perceptual_loss.net.slice2.5.weight
loss.perceptual_loss.net.slice2.5.bias
loss.perceptual_loss.net.slice2.7.weight
loss.perceptual_loss.net.slice2.7.bias
loss.perceptual_loss.net.slice3.10.weight
loss.perceptual_loss.net.slice3.10.bias
loss.perceptual_loss.net.slice3.12.weight
loss.perceptual_loss.net.slice3.12.bias
loss.perceptual_loss.net.slice3.14.weight
loss.perceptual_loss.net.slice3.14.bias
loss.perceptual_loss.net.slice4.17.weight
loss.perceptual_loss.net.slice4.17.bias
loss.perceptual_loss.net.slice4.19.weight
loss.perceptual_loss.net.slice4.19.bias
loss.perceptual_loss.net.slice4.21.weight
loss.perceptual_loss.net.slice4.21.bias
loss.perceptual_loss.net.slice5.24.weight
loss.perceptual_loss.net.slice5.24.bias
loss.perceptual_loss.net.slice5.26.weight
loss.perceptual_loss.net.slice5.26.bias
loss.perceptual_loss.net.slice5.28.weight
loss.perceptual_loss.net.slice5.28.bias
loss.perceptual_loss.lin0.model.1.weight
loss.perceptual_loss.lin1.model.1.weight
loss.perceptual_loss.lin2.model.1.weight
loss.perceptual_loss.lin3.model.1.weight
loss.perceptual_loss.lin4.model.1.weight
loss.discriminator.main.0.weight
loss.discriminator.main.0.bias
loss.discriminator.main.2.weight
loss.discriminator.main.3.weight
loss.discriminator.main.3.bias
loss.discriminator.main.3.running_mean
loss.discriminator.main.3.running_var
loss.discriminator.main.3.num_batches_tracked
loss.discriminator.main.5.weight
loss.discriminator.main.6.weight
loss.discriminator.main.6.bias
loss.discriminator.main.6.running_mean
loss.discriminator.main.6.running_var
loss.discriminator.main.6.num_batches_tracked
loss.discriminator.main.8.weight
loss.discriminator.main.9.weight
loss.discriminator.main.9.bias
loss.discriminator.main.9.running_mean
loss.discriminator.main.9.running_var
loss.discriminator.main.9.num_batches_tracked
loss.discriminator.main.11.weight
loss.discriminator.main.11.bias
quant_conv.weight
quant_conv.bias
post_quant_conv.weight
post_quant_conv.bias
DN6 commented 2 weeks ago

Which version of diffusers are you using? The snippet I shared is meant to be run with the >0.27.2 version. Based on the traceback it seems like you're using version 0.27.2 to try and load the state dict?

JemiloII commented 1 week ago

I tried with 0.27.2 and the latest release. The goal is to update to the latest, but the diffusers keeps making breaking changes. This PT one is crazy as many vsts for sd1.5 are in pt format and many i've found for sdxl are in that format as well.

When an update happens, I don't expect my production app to just break from making zero code changes. I don't use hub, but i even tried with your hub example. I don't like using the hub, just local files so i know nothing ever changes. hub will get updates. hub saves things in mysterious locations where you don't want model files anyways.