Open adhityaswami opened 1 year ago
I don't think this is a bug, this is how SD worked before. The problem here it is setting the torch.size to an odd number, in this instance 27. Which is indivisible by 4. Best to use the slider to choose a resolution close to what you need and either crop it or squeeze it. I'm not sure what was changed in SD to support odd sizes, or when the change was implemented exactly.
ie: try to make a 720 wide video
Working in txt2vid mode 0%| | 0/1 [00:00<?, ?it/s]Making a video with the following parameters: {'prompt': '', 'n_prompt': 'text, watermark, copyright, blurry, nsfw', 'steps': 30, 'frames': 24, 'seed': 2563507479, 'scale': 17, 'width': 720, 'height': 256, 'eta': 0.0, 'cpu_vae': 'GPU (half precision)', 'device': device(type='cuda'), 'skip_steps': 0, 'strength': 0} latents torch.Size([1, 4, 24, 32, 90]) tensor(-0.0008, device='cuda:0') tensor(0.9997, device='cuda:0') DDIM sampling: 0%| | 0/31 [00:00<?, ?it/s] Traceback (most recent call last): | 0/31 [00:00<?, ?it/s] File "D:\NasD\stable-diffusion-webui/extensions/sd-webui-modelscope-text2video/scripts\t2v_helpers\render.py", line 24, in run vids_pack = process_modelscope(args_dict) File "D:\NasD\stable-diffusion-webui/extensions/sd-webui-modelscope-text2video/scripts\modelscope\process_modelscope.py", line 205, in processmodelscope samples, = pipe.infer(args.prompt, args.n_prompt, args.steps, args.frames, args.seed + batch if args.seed != -1 else -1, args.cfg_scale, File "D:\NasD\stable-diffusion-webui/extensions/sd-webui-modelscope-text2video/scripts\modelscope\t2v_pipeline.py", line 253, in infer x0 = self.diffusion.ddim_sample_loop( File "D:\NasD\stable-diffusion-webui\venv\lib\site-packages\torch\utils_contextlib.py", line 115, in decorate_context return func(*args, kwargs) File "D:\NasD\stable-diffusion-webui\extensions\sd-webui-modelscope-text2video\scripts\modelscope\t2v_model.py", line 1475, in ddim_sample_loop xt = self.ddim_sample(xt, t, model, model_kwargs, clamp, File "D:\NasD\stable-diffusion-webui\venv\lib\site-packages\torch\utils_contextlib.py", line 115, in decorate_context return func(args, kwargs) File "D:\NasD\stable-diffusion-webui\extensions\sd-webui-modelscope-text2video\scripts\modelscope\t2v_model.py", line 1324, in ddimsample , , , x0 = self.p_mean_variance(xt, t, model, model_kwargs, clamp, File "D:\NasD\stable-diffusion-webui\extensions\sd-webui-modelscope-text2video\scripts\modelscope\t2v_model.py", line 1265, in p_mean_variance y_out = model(xt, self._scale_timesteps(t), model_kwargs[0]) File "D:\NasD\stable-diffusion-webui\venv\lib\site-packages\torch\nn\modules\module.py", line 1501, in _call_impl return forward_call(args, kwargs) File "D:\NasD\stable-diffusion-webui\extensions\sd-webui-modelscope-text2video\scripts\modelscope\t2v_model.py", line 380, in forward x = torch.cat([x, xs.pop()], dim=1) RuntimeError: Sizes of tensors must match except in dimension 1. Expected size 24 but got size 23 for tensor number 1 in the list. Exception occurred: Sizes of tensors must match except in dimension 1. Expected size 24 but got size 23 for tensor number 1 in the list.
size is now 90, NG. etc.
Hey looks like you were right. It does work in SD normally though, so I'll check out what the change is and try to implement it in the extension as well.
tl;dr for anyone facing this issue: Make sure your resolutions are divisible by 32
Is there an existing issue for this?
Are you using the latest version of the extension?
What happened?
I tried generating a video with 384x216 dimensions (16:9) aspect ratio basically with my custom trained converted model. However I get the following error:
DDIM sampling: 0%| | 0/50 [00:00<?, ?it/s] Traceback (most recent call last): | 0/50 [00:00<?, ?it/s] File "/home/ubuntu/text2vid/stable-diffusion-webui/extensions/sd-webui-text2video/scripts/t2v_helpers/render.py", line 27, in run vids_pack = process_modelscope(args_dict) File "/home/ubuntu/text2vid/stable-diffusion-webui/extensions/sd-webui-text2video/scripts/modelscope/process_modelscope.py", line 209, in processmodelscope samples, = pipe.infer(args.prompt, args.n_prompt, args.steps, args.frames, args.seed + batch if args.seed != -1 else -1, args.cfg_scale, File "/home/ubuntu/text2vid/stable-diffusion-webui/extensions/sd-webui-text2video/scripts/modelscope/t2v_pipeline.py", line 258, in infer x0 = self.diffusion.ddim_sample_loop( File "/home/ubuntu/text2vid/stable-diffusion-webui/venv/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context return func(*args, kwargs) File "/home/ubuntu/text2vid/stable-diffusion-webui/extensions/sd-webui-text2video/scripts/modelscope/t2v_model.py", line 1485, in ddim_sample_loop xt = self.ddim_sample(xt, t, model, model_kwargs, clamp, File "/home/ubuntu/text2vid/stable-diffusion-webui/venv/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context return func(args, kwargs) File "/home/ubuntu/text2vid/stable-diffusion-webui/extensions/sd-webui-text2video/scripts/modelscope/t2v_model.py", line 1334, in ddimsample , , , x0 = self.p_mean_variance(xt, t, model, model_kwargs, clamp, File "/home/ubuntu/text2vid/stable-diffusion-webui/extensions/sd-webui-text2video/scripts/modelscope/t2v_model.py", line 1275, in p_mean_variance y_out = model(xt, self._scale_timesteps(t), model_kwargs[0]) File "/home/ubuntu/text2vid/stable-diffusion-webui/venv/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1501, in _call_impl return forward_call(args, kwargs) File "/home/ubuntu/text2vid/stable-diffusion-webui/extensions/sd-webui-text2video/scripts/modelscope/t2v_model.py", line 380, in forward x = torch.cat([x, xs.pop()], dim=1) RuntimeError: Sizes of tensors must match except in dimension 1. Expected size 8 but got size 7 for tensor number 1 in the list. Exception occurred: Sizes of tensors must match except in dimension 1. Expected size 8 but got size 7 for tensor number 1 in the list.
This occurs even when using the original model.
Steps to reproduce the problem
What should have happened?
Should be generating a video of the required dimensions.
WebUI and Deforum extension Commit IDs
webui commit id - baf6946e06249c5af9851c60171692c44ef633e0 txt2vid commit id - a44078d1cc6a75f619037a63f3e26a483965b826
Torch version
2.0.1+cu118
What GPU were you using for launching?
NVIDIA A10G - 24GB
On which platform are you launching the webui backend with the extension?
Cloud server (Linux)
Settings
Console logs
Additional information
No response