Open maxtlw opened 9 months ago
If you have changed the input video. The controlnet_frames
should also be changed. Please check it.
controlnet_frames and input_frames have same length, but still error, any ideas what can I do? Thank you.
I checked stable_diffusion_video.py, in line 315,
def add_data_to_pipeline_inputs(self, data, pipeline_inputs):
pipeline_inputs["input_frames"] = self.load_video(**data["input_frames"])
pipeline_inputs["num_frames"] = len(pipeline_inputs["input_frames"])
pipeline_inputs["width"], pipeline_inputs["height"] = pipeline_inputs["input_frames"][0].size
if len(data["controlnet_frames"]) > 0:
pipeline_inputs["controlnet_frames"] = [self.load_video(**unit) for unit in data["controlnet_frames"]]
return pipeline_inputs
the return pipeline_inputs is: ... 'input_frames' = {list: 30} [<PIL.Image.Image image mode=RGB size=720x1080 at 0x275E498D490>, ... 'num_frames' = {int} 30 'width' = {int} 720 'height' = {int} 1080 'controlnet_frames' = {list: 2} [[<PIL.Image.Image image mode=RGB size=720x1080 at 0x275E498DCA0>, ...
0 = {list: 30} [<PIL.Image.Image image mode=RGB size=720x1080 at 0x275E498DCA0>, ... 1 = {list: 30} [<PIL.Image.Image image mode=RGB size=720x1080 at 0x275E49C3850>, ...
\DiffSynth-Studio\examples\diffutoon_toon_shading_with_editing_signals.py
Loading videos ...
Loading videos ... done!
Loading models ...
model_list: ['C:\\sd-webui-aki-v4.4\\models\\Stable-diffusion\\anime\\aingdiffusion_v12.safetensors', 'C:\\sd-webui-aki-v4.4\\models\\ControlNet\\control_v11p_sd15_softedge.pth', 'C:\\sd-webui-aki-v4.4\\models\\ControlNet\\control_v11f1p_sd15_depth.pth']
C:\software\Anaconda\envs\DiffSynthStudio\lib\site-packages\timm\models\_factory.py:117: UserWarning: Mapping deprecated model name vit_base_resnet50_384 to current vit_base_r50_s16_384.orig_in21k_ft_in1k.
model = create_fn(
Loading models ... done!
Loading smoother ...
Loading smoother ... done!
Synthesizing videos ...
C:\Users\ainse\PycharmProjects\DiffSynth-Studio\diffsynth\models\attention.py:43: UserWarning: 1Torch was not compiled with flash attention. (Triggered internally at C:\cb\pytorch_1000000000000\work\aten\src\ATen\native\transformers\cuda\sdp_utils.cpp:263.)
hidden_states = torch.nn.functional.scaled_dot_product_attention(q, k, v, attn_mask=attn_mask)
100%|██████████| 30/30 [00:02<00:00, 14.11it/s]
100%|██████████| 30/30 [00:04<00:00, 6.86it/s]
0%| | 0/20 [00:02<?, ?it/s]
Traceback (most recent call last):
File "C:\Users\ainse\PycharmProjects\DiffSynth-Studio\examples\diffutoon_toon_shading_with_editing_signals.py", line 185, in <module>
runner.run(config_stage_1)
File "C:\Users\ainse\PycharmProjects\DiffSynth-Studio\diffsynth\pipelines\stable_diffusion_video.py", line 357, in run
output_video = self.synthesize_video(model_manager, pipe, config["pipeline"]["seed"], smoother, **config["pipeline"]["pipeline_inputs"])
File "C:\Users\ainse\PycharmProjects\DiffSynth-Studio\diffsynth\pipelines\stable_diffusion_video.py", line 300, in synthesize_video
output_video = pipe(**pipeline_inputs, smoother=smoother)
File "C:\software\Anaconda\envs\DiffSynthStudio\lib\site-packages\torch\utils\_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "C:\Users\ainse\PycharmProjects\DiffSynth-Studio\diffsynth\pipelines\stable_diffusion_video.py", line 221, in __call__
noise_pred_posi = lets_dance_with_long_video(
File "C:\Users\ainse\PycharmProjects\DiffSynth-Studio\diffsynth\pipelines\stable_diffusion_video.py", line 38, in lets_dance_with_long_video
hidden_states_batch = lets_dance(
File "C:\Users\ainse\PycharmProjects\DiffSynth-Studio\diffsynth\pipelines\dancer.py", line 72, in lets_dance
hidden_states, time_emb, text_emb, res_stack = block(hidden_states, time_emb, text_emb, res_stack)
File "C:\software\Anaconda\envs\DiffSynthStudio\lib\site-packages\torch\nn\modules\module.py", line 1511, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "C:\software\Anaconda\envs\DiffSynthStudio\lib\site-packages\torch\nn\modules\module.py", line 1520, in _call_impl
return forward_call(*args, **kwargs)
File "C:\Users\ainse\PycharmProjects\DiffSynth-Studio\diffsynth\models\sd_unet.py", line 222, in forward
hidden_states = torch.cat([hidden_states, res_hidden_states], dim=1)
RuntimeError: Sizes of tensors must match except in dimension 1. Expected size 24 but got size 23 for tensor number 1 in the list.
Oh! The resolution is not supported. It should be a multiple of 64. For example, 1536*1024.
Hi! :) I'm really interested in the new Difftoon pipeline, but whatever input video I use I get this error
I set up the environment as indicated in the
README.md
and it worked flawlessly. I have no idea what I should look for to fix this: I haven't changed anything in the settings except the input video path and its resolution.Thank you for the help!!