[Bug]: Video Generate Fails

Is there an existing issue for this?

[X] I have searched the existing issues and checked the recent builds/commits of both this extension and the webui

Are you using the latest version of the extension?

[X] I have the modelscope text2video extension updated to the lastest version and I still have the issue.

What happened?

When I click the "Generate " button, I receive an error in the cmd terminal.

 Exception occurred: [Errno 2] No such file or directory: 'C:\\Users\\antma\\Downloads\\SuperSD2\\stable-diffusion-webui\\models/ModelScope/t2v/configuration.json'

Steps to reproduce the problem

Go to the text2video tab
Input a prompt and a negative prompt in the corresponding text boixxes
Leave settings at default
Click the "Generate" Button

What should have happened?

It should have generated a video.

WebUI and Deforum extension Commit IDs

webui commit id - 22bcc7be txt2vid commit id - 4fea1ada

What GPU were you using for launching?

3090TI

On which platform are you launching the webui backend with the extension?

Local PC setup (Windows)

Settings

Model Type: Model Scope txt2vid Steps: 30 CFG Scale: 17 Width: 256 Height: 256 Seed: -1 ETA: 0 Frames: 24 Bathc Count: 1

Console logs

venv "C:\Users\antma\Downloads\SuperSD2\stable-diffusion-webui\venv\Scripts\Python.exe"
Python 3.10.7 (tags/v3.10.7:6cc6b13, Sep  5 2022, 14:08:36) [MSC v.1933 64 bit (AMD64)]
Commit hash: 22bcc7be428c94e9408f589966c2040187245d81
Installing requirements for Web UI

If submitting an issue on github, please provide the full startup log for debugging purposes.

Initializing Dreambooth
Dreambooth revision: 926ae204ef5de17efca2059c334b6098492a0641
Successfully installed accelerate-0.18.0 fastapi-0.94.1 gitpython-3.1.31 google-auth-oauthlib-0.4.6 requests-2.29.0 tqdm-4.64.1 transformers-4.26.1

Does your project take forever to startup?
Repetitive dependency installation may be the reason.
Automatic1111's base project sets strict requirements on outdated dependencies.
If an extension is using a newer version, the dependency is uninstalled and reinstalled twice every startup.

[!] xformers NOT installed.
[+] torch version 1.13.1+cu117 installed.
[+] torchvision version 0.14.1+cu117 installed.
[+] accelerate version 0.18.0 installed.
[+] diffusers version 0.14.0 installed.
[+] transformers version 4.26.1 installed.
[+] bitsandbytes version 0.35.4 installed.

Launching Web UI with arguments:
No module 'xformers'. Proceeding without it.
Loading weights [b914725d74] from C:\Users\antma\Downloads\SuperSD2\stable-diffusion-webui\models\Stable-diffusion\protogenX53Photorealism_10.ckpt
Creating model from config: C:\Users\antma\Downloads\SuperSD2\stable-diffusion-webui\configs\v1-inference.yaml
LatentDiffusion: Running in eps-prediction mode
DiffusionWrapper has 859.52 M params.
Applying cross attention optimization (Doggettx).
Textual inversion embeddings loaded(0):
Model loaded in 3.5s (load weights from disk: 0.5s, create model: 1.2s, apply weights to model: 0.2s, apply half(): 0.4s, move model to device: 0.4s, load textual inversion embeddings: 0.7s).
CUDA SETUP: Loading binary C:\Users\antma\Downloads\SuperSD2\stable-diffusion-webui\venv\lib\site-packages\bitsandbytes\libbitsandbytes_cudaall.dll...
Running on local URL:  http://127.0.0.1:7860

To create a public link, set `share=True` in `launch()`.
Startup time: 13.3s (import torch: 5.2s, import gradio: 0.5s, import ldm: 0.6s, other imports: 1.6s, load scripts: 1.1s, load SD checkpoint: 3.6s, create ui: 0.6s).
100%|██████████████████████████████████████████████████████████████████████████████████| 20/20 [00:03<00:00,  6.34it/s]
Total progress: 100%|██████████████████████████████████████████████████████████████████| 20/20 [00:02<00:00,  9.81it/s]
text2video — The model selected is:  ModelScope████████████████████████████████████████| 20/20 [00:02<00:00, 10.47it/s]
 text2video extension for auto1111 webui
Git commit: 4fea1ada (Sun Apr 23 10:39:51 2023)
Starting text2video
Pipeline setup
Traceback (most recent call last):
  File "C:\Users\antma\Downloads\SuperSD2\stable-diffusion-webui/extensions/sd-webui-text2video/scripts\t2v_helpers\render.py", line 24, in run
    vids_pack = process_modelscope(args_dict)
  File "C:\Users\antma\Downloads\SuperSD2\stable-diffusion-webui/extensions/sd-webui-text2video/scripts\modelscope\process_modelscope.py", line 55, in process_modelscope
    pipe = setup_pipeline()
  File "C:\Users\antma\Downloads\SuperSD2\stable-diffusion-webui/extensions/sd-webui-text2video/scripts\modelscope\process_modelscope.py", line 26, in setup_pipeline
    return TextToVideoSynthesis(ph.models_path+'/ModelScope/t2v')
  File "C:\Users\antma\Downloads\SuperSD2\stable-diffusion-webui/extensions/sd-webui-text2video/scripts\modelscope\t2v_pipeline.py", line 58, in __init__
    with open(model_dir+'/configuration.json', 'r') as f:
FileNotFoundError: [Errno 2] No such file or directory: 'C:\\Users\\antma\\Downloads\\SuperSD2\\stable-diffusion-webui\\models/ModelScope/t2v/configuration.json'
Exception occurred: [Errno 2] No such file or directory: 'C:\\Users\\antma\\Downloads\\SuperSD2\\stable-diffusion-webui\\models/ModelScope/t2v/configuration.json'
Interrupted with signal 2 in <frame at 0x000001D50C710AC0, file 'C:\\Users\\antma\\Downloads\\SuperSD2\\stable-diffusion-webui\\webui.py', line 209, code wait_on_server>

Additional information

I try with "Video Crawler (WIP) and get a different error:

Python 3.10.7 (tags/v3.10.7:6cc6b13, Sep  5 2022, 14:08:36) [MSC v.1933 64 bit (AMD64)]
Commit hash: 22bcc7be428c94e9408f589966c2040187245d81
Installing requirements for Web UI

If submitting an issue on github, please provide the full startup log for debugging purposes.

Initializing Dreambooth
Dreambooth revision: 926ae204ef5de17efca2059c334b6098492a0641
Successfully installed accelerate-0.18.0 fastapi-0.94.1 gitpython-3.1.31 google-auth-oauthlib-0.4.6 requests-2.29.0 transformers-4.26.1

Does your project take forever to startup?
Repetitive dependency installation may be the reason.
Automatic1111's base project sets strict requirements on outdated dependencies.
If an extension is using a newer version, the dependency is uninstalled and reinstalled twice every startup.

[!] xformers NOT installed.
[+] torch version 1.13.1+cu117 installed.
[+] torchvision version 0.14.1+cu117 installed.
[+] accelerate version 0.18.0 installed.
[+] diffusers version 0.14.0 installed.
[+] transformers version 4.26.1 installed.
[+] bitsandbytes version 0.35.4 installed.

Launching Web UI with arguments:
No module 'xformers'. Proceeding without it.
Loading weights [b914725d74] from C:\Users\antma\Downloads\SuperSD2\stable-diffusion-webui\models\Stable-diffusion\protogenX53Photorealism_10.ckpt
Creating model from config: C:\Users\antma\Downloads\SuperSD2\stable-diffusion-webui\configs\v1-inference.yaml
LatentDiffusion: Running in eps-prediction mode
DiffusionWrapper has 859.52 M params.
Applying cross attention optimization (Doggettx).
Textual inversion embeddings loaded(0):
Model loaded in 3.7s (load weights from disk: 0.5s, create model: 1.4s, apply weights to model: 0.2s, apply half(): 0.4s, move model to device: 0.4s, load textual inversion embeddings: 0.7s).
CUDA SETUP: Loading binary C:\Users\antma\Downloads\SuperSD2\stable-diffusion-webui\venv\lib\site-packages\bitsandbytes\libbitsandbytes_cudaall.dll...
Running on local URL:  http://127.0.0.1:7860

To create a public link, set `share=True` in `launch()`.
Startup time: 9.7s (import torch: 2.7s, import gradio: 0.5s, import ldm: 0.4s, other imports: 0.6s, load scripts: 1.0s, load SD checkpoint: 3.8s, create ui: 0.6s).
text2video — The model selected is:  VideoCrafter (WIP)
 text2video extension for auto1111 webui
Git commit: 4fea1ada (Sun Apr 23 10:39:51 2023)
VideoCrafter config:
 {'model': {'target': 'lvdm.models.ddpm3d.LatentDiffusion', 'params': {'linear_start': 0.00085, 'linear_end': 0.012, 'num_timesteps_cond': 1, 'log_every_t': 200, 'timesteps': 1000, 'first_stage_key': 'video', 'cond_stage_key': 'caption', 'image_size': [32, 32], 'video_length': 16, 'channels': 4, 'cond_stage_trainable': False, 'conditioning_key': 'crossattn', 'scale_by_std': False, 'scale_factor': 0.18215, 'unet_config': {'target': 'lvdm.models.modules.openaimodel3d.UNetModel', 'params': {'image_size': 32, 'in_channels': 4, 'out_channels': 4, 'model_channels': 320, 'attention_resolutions': [4, 2, 1], 'num_res_blocks': 2, 'channel_mult': [1, 2, 4, 4], 'num_heads': 8, 'transformer_depth': 1, 'context_dim': 768, 'use_checkpoint': True, 'legacy': False, 'kernel_size_t': 1, 'padding_t': 0, 'temporal_length': 16, 'use_relative_position': True}}, 'first_stage_config': {'target': 'lvdm.models.autoencoder.AutoencoderKL', 'params': {'embed_dim': 4, 'monitor': 'val/rec_loss', 'ddconfig': {'double_z': True, 'z_channels': 4, 'resolution': 256, 'in_channels': 3, 'out_ch': 3, 'ch': 128, 'ch_mult': [1, 2, 4, 4], 'num_res_blocks': 2, 'attn_resolutions': [], 'dropout': 0.0}, 'lossconfig': {'target': 'torch.nn.Identity'}}}, 'cond_stage_config': {'target': 'lvdm.models.modules.condition_modules.FrozenCLIPEmbedder'}}}}
Loading model from C:\Users\antma\Downloads\SuperSD2\stable-diffusion-webui\models/VideoCrafter/model.ckpt
Error verifying pickled file from C:\Users\antma\Downloads\SuperSD2\stable-diffusion-webui\models/VideoCrafter/model.ckpt:
Traceback (most recent call last):
  File "C:\Users\antma\Downloads\SuperSD2\stable-diffusion-webui\modules\safe.py", line 135, in load_with_extra
    check_pt(filename, extra_handler)
  File "C:\Users\antma\Downloads\SuperSD2\stable-diffusion-webui\modules\safe.py", line 81, in check_pt
    with zipfile.ZipFile(filename) as z:
  File "C:\Users\antma\AppData\Local\Programs\Python\Python310\lib\zipfile.py", line 1249, in __init__
    self.fp = io.open(file, filemode)
FileNotFoundError: [Errno 2] No such file or directory: 'C:\\Users\\antma\\Downloads\\SuperSD2\\stable-diffusion-webui\\models/VideoCrafter/model.ckpt'

The file may be malicious, so the program is not going to read it.
You can skip this check with --disable-safe-unpickle commandline argument.

LatentDiffusion: Running in eps-prediction mode
Successfully initialize the diffusion model !
DiffusionWrapper has 958.92 M params.
making attention of type 'vanilla' with 512 in_channels
Working with z of shape (1, 4, 32, 32) = 4096 dimensions.
making attention of type 'vanilla' with 512 in_channels
Traceback (most recent call last):
  File "C:\Users\antma\Downloads\SuperSD2\stable-diffusion-webui/extensions/sd-webui-text2video/scripts\t2v_helpers\render.py", line 26, in run
    vids_pack = process_videocrafter(args_dict)
  File "C:\Users\antma\Downloads\SuperSD2\stable-diffusion-webui/extensions/sd-webui-text2video/scripts\videocrafter\process_videocrafter.py", line 41, in process_videocrafter
    model, _, _ = load_model(config, ph.models_path+'/VideoCrafter/model.ckpt', #TODO: support safetensors and stuff
  File "C:\Users\antma\Downloads\SuperSD2\stable-diffusion-webui/extensions/sd-webui-text2video/scripts\videocrafter\sample_utils.py", line 28, in load_model
    model.load_state_dict(sd, strict=True)
  File "C:\Users\antma\Downloads\SuperSD2\stable-diffusion-webui\venv\lib\site-packages\torch\nn\modules\module.py", line 1624, in load_state_dict
    raise TypeError("Expected state_dict to be dict-like, got {}.".format(type(state_dict)))
TypeError: Expected state_dict to be dict-like, got <class 'NoneType'>.
Exception occurred: Expected state_dict to be dict-like, got <class 'NoneType'>.

kabachuha / sd-webui-text2video

[Bug]: Video Generate Fails #139