Steps: 30
Frames: 30
cfg_scale: 7
width/height: 256
seed: -1
eta: 0
denoising strength: 0.75
vid2vid start frame: tried both 1 and 200, but same result
batch count: 1
VAE Mode: tried both GPU (half precision) and GPU, but same result
Console logs
`ModelScope text2video extension for auto1111 webui
Git commit: 066a9e13 (Sun Mar 26 15:10:21 2023)
Starting text2video
Pipeline setup
config namespace(framework='pytorch', task='text-to-video-synthesis', model={'type': 'latent-text-to-video-synthesis', 'model_args': {'ckpt_clip': 'open_clip_pytorch_model.bin', 'ckpt_unet': 'text2video_pytorch_model.pth', 'ckpt_autoencoder': 'VQGAN_autoencoder.pth', 'max_frames': 16, 'tiny_gpu': 1}, 'model_cfg': {'unet_in_dim': 4, 'unet_dim': 320, 'unet_y_dim': 768, 'unet_context_dim': 1024, 'unet_out_dim': 4, 'unet_dim_mult': [1, 2, 4, 4], 'unet_num_heads': 8, 'unet_head_dim': 64, 'unet_res_blocks': 2, 'unet_attn_scales': [1, 0.5, 0.25], 'unet_dropout': 0.1, 'temporal_attention': 'True', 'num_timesteps': 1000, 'mean_type': 'eps', 'var_type': 'fixed_small', 'loss_type': 'mse'}}, pipeline={'type': 'latent-text-to-video-synthesis'})
device cuda
got a request to *vid2vid* an existing video.
Trying to extract frames from video with input FPS of 23.976023976023978. Please wait patiently.
Successfully extracted 2244.0 frames from video.
Loading frames: 0it [00:00, ?it/s]
Traceback (most recent call last):
File "C:\source\stable-diffusion-webui_clean\extensions\sd-webui-modelscope-text2video\scripts\modelscope-text2vid.py", line 125, in process
images=np.stack(images)# f h w c
File "<__array_function__ internals>", line 180, in stack
File "C:\source\stable-diffusion-webui_clean\venv\lib\site-packages\numpy\core\shape_base.py", line 422, in stack
raise ValueError('need at least one array to stack')
ValueError: need at least one array to stack
Exception occurred: need at least one array to stack`
Is there an existing issue for this?
Are you using the latest version of the extension?
What happened?
I was trying to use vid2vid but kept getting an exception.
Steps to reproduce the problem
What should have happened?
It should have generated a video based on my prompt and the input video.
WebUI and Deforum extension Commit IDs
webui commit id - a9fed7c3 txt2vid commit id - 066a9e1
What GPU were you using for launching?
RTX 4090 24GB
On which platform are you launching the webui backend with the extension?
Local PC setup (Windows)
Settings
Windows 11, python: 3.10.10 • torch: 2.0.0+cu118 • xformers: 0.0.17+b6be33a.d20230315 • gradio: 3.16.2
Steps: 30 Frames: 30 cfg_scale: 7 width/height: 256 seed: -1 eta: 0 denoising strength: 0.75 vid2vid start frame: tried both 1 and 200, but same result batch count: 1 VAE Mode: tried both GPU (half precision) and GPU, but same result
Console logs
Additional information
https://www.youtube.com/watch?v=75rRs6fraUI&t=1s&ab_channel=VICENews was the video.
It appears it worked for a different video I input, but not this one. Maybe it's just allergic to BS?