Edit: Working either way and joined the discord - will get help there if needed. thx!
--
I just cloned/started playing w/ the repo, so forgive me if this is user error, but just wanted to mention that when I use the length and/or duration arguments there's a warning saying "setting num_images_per_prompt = 1". I first thought this was going to produce a single image GIF, but on checking again it does seem as though it's outputting the correct number of frames. When I don't add those arguments that warning is not presented.
I also get a runtime error if I try to generate frames/length over ~3seconds (24frames), but I assume that's due to VRam/resource issues. (Running a 4090 24gb VRam, 64gb cpu ram).
(venv) D:\SDXL\Hotshot-XL>python inference.py --video_length=40 --video_duration=5000 --prompt="Will Smith eating spaghetti, hd, high quality" --output "hotshottest40-5000.gif"
Loading pipeline components...: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 7/7 [00:02<00:00, 2.34it/s]
Warning - setting num_images_per_prompt = 1 because video_length = 40
0%| | 0/30 [00:00<?, ?it/s]
Traceback (most recent call last):
File "D:\SDXL\Hotshot-XL\inference.py", line 223, in <module>
main()
File "D:\SDXL\Hotshot-XL\inference.py", line 203, in main
images = pipe(args.prompt,
File "D:\SDXL\Hotshot-XL\venv\lib\site-packages\torch\utils\_contextlib.py", line 115, in decorate_context
return func(*args, **kwargs)
File "D:\SDXL\Hotshot-XL\hotshot_xl\pipelines\hotshot_xl_pipeline.py", line 825, in __call__
noise_pred = self.unet(
File "D:\SDXL\Hotshot-XL\venv\lib\site-packages\torch\nn\modules\module.py", line 1518, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "D:\SDXL\Hotshot-XL\venv\lib\site-packages\torch\nn\modules\module.py", line 1527, in _call_impl
return forward_call(*args, **kwargs)
File "D:\SDXL\Hotshot-XL\hotshot_xl\models\unet.py", line 849, in forward
sample, res_samples = downsample_block(hidden_states=sample,
File "D:\SDXL\Hotshot-XL\venv\lib\site-packages\torch\nn\modules\module.py", line 1518, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "D:\SDXL\Hotshot-XL\venv\lib\site-packages\torch\nn\modules\module.py", line 1527, in _call_impl
return forward_call(*args, **kwargs)
File "D:\SDXL\Hotshot-XL\hotshot_xl\models\unet_blocks.py", line 475, in forward
hidden_states = temporal_attention(hidden_states, encoder_hidden_states=encoder_hidden_states)
File "D:\SDXL\Hotshot-XL\venv\lib\site-packages\torch\nn\modules\module.py", line 1518, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "D:\SDXL\Hotshot-XL\venv\lib\site-packages\torch\nn\modules\module.py", line 1527, in _call_impl
return forward_call(*args, **kwargs)
File "D:\SDXL\Hotshot-XL\hotshot_xl\models\transformer_temporal.py", line 123, in forward
hidden_states = block(hidden_states, encoder_hidden_states=encoder_hidden_states, number_of_frames=f)
File "D:\SDXL\Hotshot-XL\venv\lib\site-packages\torch\nn\modules\module.py", line 1518, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "D:\SDXL\Hotshot-XL\venv\lib\site-packages\torch\nn\modules\module.py", line 1527, in _call_impl
return forward_call(*args, **kwargs)
File "D:\SDXL\Hotshot-XL\hotshot_xl\models\transformer_temporal.py", line 181, in forward
hidden_states = block(
File "D:\SDXL\Hotshot-XL\venv\lib\site-packages\torch\nn\modules\module.py", line 1518, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "D:\SDXL\Hotshot-XL\venv\lib\site-packages\torch\nn\modules\module.py", line 1527, in _call_impl
return forward_call(*args, **kwargs)
File "D:\SDXL\Hotshot-XL\hotshot_xl\models\transformer_temporal.py", line 59, in forward
hidden_states = self.pos_encoder(hidden_states, length=number_of_frames)
File "D:\SDXL\Hotshot-XL\venv\lib\site-packages\torch\nn\modules\module.py", line 1518, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "D:\SDXL\Hotshot-XL\venv\lib\site-packages\torch\nn\modules\module.py", line 1527, in _call_impl
return forward_call(*args, **kwargs)
File "D:\SDXL\Hotshot-XL\hotshot_xl\models\transformer_temporal.py", line 47, in forward
hidden_states = hidden_states + self.positional_encoding[:, :length]
RuntimeError: The size of tensor a (40) must match the size of tensor b (24) at non-singleton dimension 1
Edit: Working either way and joined the discord - will get help there if needed. thx!
-- I just cloned/started playing w/ the repo, so forgive me if this is user error, but just wanted to mention that when I use the length and/or duration arguments there's a warning saying "setting num_images_per_prompt = 1". I first thought this was going to produce a single image GIF, but on checking again it does seem as though it's outputting the correct number of frames. When I don't add those arguments that warning is not presented.
I also get a runtime error if I try to generate frames/length over ~3seconds (24frames), but I assume that's due to VRam/resource issues. (Running a 4090 24gb VRam, 64gb cpu ram).