I am trying to run the following but getting error.
(venv) C:\tut\DiffSynth-Studio>python examples\Diffutoon\diffutoon_toon_shading.py
Failed to load cpm_kernels:No module named 'cpm_kernels'
Downloading models: ['AingDiffusion_v12', 'AnimateDiff_v2', 'ControlNet_v11p_sd15_lineart', 'ControlNet_v11f1e_sd15_tile', 'TextualInversion_VeryBadImageNegative_v1.3']
aingdiffusion_v12.safetensors has been already in models/stable_diffusion.
mm_sd_v15_v2.ckpt has been already in models/AnimateDiff.
control_v11p_sd15_lineart.pth has been already in models/ControlNet.
sk_model.pth has been already in models/Annotators.
sk_model2.pth has been already in models/Annotators.
control_v11f1e_sd15_tile.pth has been already in models/ControlNet.
verybadimagenegative_v1.3.pt has been already in models/textual_inversion.
Loading models from: models/stable_diffusion/aingdiffusion_v12.safetensors
model_name: sd_text_encoder model_class: SDTextEncoder
model_name: sd_unet model_class: SDUNet
model_name: sd_vae_decoder model_class: SDVAEDecoder
model_name: sd_vae_encoder model_class: SDVAEEncoder
The following models are loaded: ['sd_text_encoder', 'sd_unet', 'sd_vae_decoder', 'sd_vae_encoder'].
Loading models from: models/AnimateDiff/mm_sd_v15_v2.ckpt
model_name: sd_motion_modules model_class: SDMotionModel
The following models are loaded: ['sd_motion_modules'].
Loading models from: models/ControlNet/control_v11f1e_sd15_tile.pth
model_name: sd_controlnet model_class: SDControlNet
The following models are loaded: ['sd_controlnet'].
Loading models from: models/ControlNet/control_v11p_sd15_lineart.pth
model_name: sd_controlnet model_class: SDControlNet
The following models are loaded: ['sd_controlnet'].
C:\tut\DiffSynth-Studio\venv\lib\site-packages\transformers\tokenization_utils_base.py:1601: FutureWarning: `clean_up_tokenization_spaces` was not set. It will be set to `True` by default. This behavior will be depracted in transformers v4.45, and will be then set to `False` by default. For more details check this issue: https://github.com/huggingface/transformers/issues/31884
warnings.warn(
Using sd_text_encoder from models/stable_diffusion/aingdiffusion_v12.safetensors.
Using sd_unet from models/stable_diffusion/aingdiffusion_v12.safetensors.
Using sd_vae_decoder from models/stable_diffusion/aingdiffusion_v12.safetensors.
Using sd_vae_encoder from models/stable_diffusion/aingdiffusion_v12.safetensors.
Using sd_controlnet from models/ControlNet/control_v11f1e_sd15_tile.pth.
Using sd_controlnet from models/ControlNet/control_v11p_sd15_lineart.pth.
No sd_ipadapter models available.
No sd_ipadapter_clip_image_encoder models available.
Using sd_motion_modules from models/AnimateDiff/mm_sd_v15_v2.ckpt.
c:\tut\diffsynth-studio\diffsynth\models\attention.py:54: UserWarning: 1Torch was not compiled with flash attention. (Triggered internally at ..\aten\src\ATen\native\transformers\cuda\sdp_utils.cpp:455.)
hidden_states = torch.nn.functional.scaled_dot_product_attention(q, k, v, attn_mask=attn_mask)
Textual inversion verybadimagenegative_v1.3 is enabled.
100%|████████████████████████████████████████████████████████████████████████████████| 30/30 [00:00<00:00, 147.31it/s]
100%|█████████████████████████████████████████████████████████████████████████████████| 30/30 [00:02<00:00, 12.61it/s]
0%| | 0/10 [00:03<?, ?it/s]
Traceback (most recent call last):
File "C:\tut\DiffSynth-Studio\examples\Diffutoon\diffutoon_toon_shading.py", line 100, in <module>
runner.run(config)
File "c:\tut\diffsynth-studio\diffsynth\pipelines\pipeline_runner.py", line 98, in run
output_video = self.synthesize_video(model_manager, pipe, config["pipeline"]["seed"], smoother, **config["pipeline"]["pipeline_inputs"])
File "c:\tut\diffsynth-studio\diffsynth\pipelines\pipeline_runner.py", line 48, in synthesize_video
output_video = pipe(**pipeline_inputs, smoother=smoother)
File "C:\tut\DiffSynth-Studio\venv\lib\site-packages\torch\utils\_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "c:\tut\diffsynth-studio\diffsynth\pipelines\sd_video.py", line 232, in __call__
noise_pred_posi = lets_dance_with_long_video(
File "c:\tut\diffsynth-studio\diffsynth\pipelines\sd_video.py", line 40, in lets_dance_with_long_video
hidden_states_batch = lets_dance(
File "c:\tut\diffsynth-studio\diffsynth\pipelines\dancer.py", line 76, in lets_dance
hidden_states, time_emb, text_emb, res_stack = block(hidden_states, time_emb, text_emb, res_stack)
File "C:\tut\DiffSynth-Studio\venv\lib\site-packages\torch\nn\modules\module.py", line 1532, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "C:\tut\DiffSynth-Studio\venv\lib\site-packages\torch\nn\modules\module.py", line 1541, in _call_impl
return forward_call(*args, **kwargs)
File "c:\tut\diffsynth-studio\diffsynth\models\sd_unet.py", line 226, in forward
hidden_states = torch.cat([hidden_states, res_hidden_states], dim=1)
RuntimeError: Sizes of tensors must match except in dimension 1. Expected size 24 but got size 23 for tensor number 1 in the list.
https://github.com/modelscope/DiffSynth-Studio/blob/main/examples/Diffutoon/README.md
I am trying to run the following but getting error.
diffutoon_toon_shading.py