kabachuha / sd-webui-text2video

Auto1111 extension implementing text2video diffusion models (like ModelScope or VideoCrafter) using only Auto1111 webui dependencies
Other
1.28k stars 108 forks source link

[Bug]: Expected state_dict to be dict-like #222

Closed derodz closed 11 months ago

derodz commented 11 months ago

Is there an existing issue for this?

Are you using the latest version of the extension?

What happened?

No prompt will generate video. I receive a Type error under Modelscope and when attempting VideoCrafter I receive a different error "Torch not compiled with CUDA enabled"

Steps to reproduce the problem

Using a Macbook Pro with Apple Silicon

  1. Install latest version of stable diffusion web ui
  2. Follow the instructions in this repo to install text2video
  3. Run any prompt

What should have happened?

I expect to generate a video

WebUI and Deforum extension Commit IDs

webui commit id - 68f336b txt2vid commit id - 8f0af8c

Torch version

2.0.1

What GPU were you using for launching?

Apple Silicon M1 Max

On which platform are you launching the webui backend with the extension?

Local PC setup (Mac)

Settings

image

Console logs

./webui.sh 

################################################################
Install script for stable-diffusion + Web UI
Tested on Debian 11 (Bullseye)
################################################################

################################################################
Running on user
################################################################

################################################################
Repo already cloned, using it as install directory
################################################################

################################################################
Create and activate python venv
################################################################

################################################################
Launching launch.py...
################################################################
Python 3.10.12 (main, Jun 20 2023, 19:43:52) [Clang 14.0.3 (clang-1403.0.22.14.1)]
Version: v1.5.1
Commit hash: 68f336bd994bed5442ad95bad6b6ad5564a5409a
Installing requirements
Installing sd-webui-xl-demo requirements_webui.txt

Launching Web UI with arguments: --skip-torch-cuda-test --upcast-sampling --no-half-vae --use-cpu interrogate
no module 'xformers'. Processing without...
no module 'xformers'. Processing without...
No module 'xformers'. Proceeding without it.
Warning: caught exception 'Torch not compiled with CUDA enabled', memory monitor disabled
Using SDXL 0.9
Loading weights [6ce0161689] from /stable-diffusion-webui/models/Stable-diffusion/v1-5-pruned-emaonly.safetensors
Creating model from config: /stable-diffusion-webui/configs/v1-inference.yaml
LatentDiffusion: Running in eps-prediction mode
Running on local URL:  http://127.0.0.1:7860

To create a public link, set `share=True` in `launch()`.
Startup time: 15.0s (launcher: 11.5s, import torch: 1.2s, import gradio: 0.4s, setup paths: 0.4s, other imports: 0.4s, load scripts: 0.6s, create ui: 0.3s).
DiffusionWrapper has 859.52 M params.
Applying attention optimization: InvokeAI... done.
Model loaded in 3.0s (load weights from disk: 0.3s, create model: 0.7s, apply weights to model: 0.4s, apply half(): 0.2s, move model to device: 0.4s, calculate empty prompt: 1.0s).
text2video — The model selected is: ModelScope (ModelScope-like)
 text2video extension for auto1111 webui
Git commit: 8f0af8c9
Starting text2video
Pipeline setup
config namespace(framework='pytorch', task='text-to-video-synthesis', model={'type': 'latent-text-to-video-synthesis', 'model_args': {'ckpt_clip': 'open_clip_pytorch_model.bin', 'ckpt_unet': 'text2video_pytorch_model.pth', 'ckpt_autoencoder': 'VQGAN_autoencoder.pth', 'max_frames': 16, 'tiny_gpu': 1}, 'model_cfg': {'unet_in_dim': 4, 'unet_dim': 320, 'unet_y_dim': 768, 'unet_context_dim': 1024, 'unet_out_dim': 4, 'unet_dim_mult': [1, 2, 4, 4], 'unet_num_heads': 8, 'unet_head_dim': 64, 'unet_res_blocks': 2, 'unet_attn_scales': [1, 0.5, 0.25], 'unet_dropout': 0.1, 'temporal_attention': 'True', 'num_timesteps': 1000, 'mean_type': 'eps', 'var_type': 'fixed_small', 'loss_type': 'mse'}}, pipeline={'type': 'latent-text-to-video-synthesis'})
*** Error verifying pickled file from /stable-diffusion-webui/models/text2video/ModelScope/text2video_pytorch_model.pth
*** The file may be malicious, so the program is not going to read it.
*** You can skip this check with --disable-safe-unpickle commandline argument.
*** 
    Traceback (most recent call last):
      File "/stable-diffusion-webui/modules/safe.py", line 83, in check_pt
        with zipfile.ZipFile(filename) as z:
      File "/opt/homebrew/Cellar/python@3.10/3.10.12_1/Frameworks/Python.framework/Versions/3.10/lib/python3.10/zipfile.py", line 1269, in __init__
        self._RealGetContents()
      File "/opt/homebrew/Cellar/python@3.10/3.10.12_1/Frameworks/Python.framework/Versions/3.10/lib/python3.10/zipfile.py", line 1336, in _RealGetContents
        raise BadZipFile("File is not a zip file")
    zipfile.BadZipFile: File is not a zip file

    During handling of the above exception, another exception occurred:

    Traceback (most recent call last):
      File "/stable-diffusion-webui/modules/safe.py", line 137, in load_with_extra
        check_pt(filename, extra_handler)
      File "/stable-diffusion-webui/modules/safe.py", line 104, in check_pt
        unpickler.load()
      File "/opt/homebrew/Cellar/python@3.10/3.10.12_1/Frameworks/Python.framework/Versions/3.10/lib/python3.10/pickle.py", line 1213, in load
        dispatch[key[0]](self)
    KeyError: 118

---
Traceback (most recent call last):
  File "/stable-diffusion-webui/extensions/sd-webui-text2video/scripts/t2v_helpers/render.py", line 30, in run
    vids_pack = process_modelscope(args_dict, args)
  File "/stable-diffusion-webui/extensions/sd-webui-text2video/scripts/modelscope/process_modelscope.py", line 65, in process_modelscope
    pipe = setup_pipeline(args.model)
  File "/stable-diffusion-webui/extensions/sd-webui-text2video/scripts/modelscope/process_modelscope.py", line 31, in setup_pipeline
    return TextToVideoSynthesis(get_model_location(model_name))
  File "/stable-diffusion-webui/extensions/sd-webui-text2video/scripts/modelscope/t2v_pipeline.py", line 94, in __init__
    self.sd_model.load_state_dict(
  File "/stable-diffusion-webui/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1994, in load_state_dict
    raise TypeError("Expected state_dict to be dict-like, got {}.".format(type(state_dict)))
TypeError: Expected state_dict to be dict-like, got <class 'NoneType'>.
Exception occurred: Expected state_dict to be dict-like, got <class 'NoneType'>.

Additional information

No response

github-actions[bot] commented 11 months ago

This issue has been closed due to incorrect formatting. Please address the following mistakes and reopen the issue:

muze75 commented 5 months ago

I am getting this same error upon hitting generate in txt2video.

I have pytorch 2.2.0 because xformers isn't currently compatible with 2.2.1