jy0205 / Pyramid-Flow

Code of Pyramidal Flow Matching for Efficient Video Generative Modeling
https://pyramid-flow.github.io/
MIT License
2.4k stars 233 forks source link

error: I put the cursor on 15s of video to generate, the video created is only 5s ! (gradio demo, 768p) #183

Closed Giribot closed 4 days ago

Giribot commented 6 days ago

hello ! I put the cursor on 15s of video to generate, the video created is only 5s !!!! (after 20 hours of computing)

pyrflo

tthe result is good but is only 5s instead of the expected 15s

https://github.com/user-attachments/assets/e8c41abc-3833-4fe5-b98f-1db33d1f593c

LOGS:

Microsoft Windows [version 10.0.22631.4460]
(c) Microsoft Corporation. Tous droits réservés.

D:\Data\Packages\PyramidFlow2\Pyramid-Flow>python app.py
WARNING[XFORMERS]: xFormers can't load C++/CUDA extensions. xFormers was built for:
    PyTorch 2.4.0+cu121 with CUDA 1201 (you have 2.4.0+cu118)
    Python  3.10.11 (you have 3.10.6)
  Please reinstall xformers (see https://github.com/facebookresearch/xformers#installing-xformers)
  Memory-efficient attention, SwiGLU, sparse and more won't be available.
  Set XFORMERS_MORE_DETAILS=1 for more details
A matching Triton is not available, some optimizations will not be enabled
Traceback (most recent call last):
  File "C:\Users\gilda\AppData\Local\Programs\Python\Python310\lib\site-packages\xformers\__init__.py", line 57, in _is_triton_available
    import triton  # noqa
  File "C:\Users\gilda\AppData\Local\Programs\Python\Python310\lib\site-packages\triton\__init__.py", line 13, in <module>
    from . import language
  File "C:\Users\gilda\AppData\Local\Programs\Python\Python310\lib\site-packages\triton\language\__init__.py", line 2, in <module>
    from . import core, extern, libdevice, random
  File "C:\Users\gilda\AppData\Local\Programs\Python\Python310\lib\site-packages\triton\language\core.py", line 1141, in <module>
    def abs(x):
  File "C:\Users\gilda\AppData\Local\Programs\Python\Python310\lib\site-packages\triton\runtime\jit.py", line 386, in jit
    return JITFunction(args[0], **kwargs)
  File "C:\Users\gilda\AppData\Local\Programs\Python\Python310\lib\site-packages\triton\runtime\jit.py", line 315, in __init__
    self.run = self._make_launcher()
  File "C:\Users\gilda\AppData\Local\Programs\Python\Python310\lib\site-packages\triton\runtime\jit.py", line 282, in _make_launcher
    scope = {"version_key": version_key(), "get_cuda_stream": get_cuda_stream,
  File "C:\Users\gilda\AppData\Local\Programs\Python\Python310\lib\site-packages\triton\runtime\jit.py", line 82, in version_key
    with open(triton._C.libtriton.__file__, "rb") as f:
AttributeError: partially initialized module 'triton' has no attribute '_C' (most likely due to a circular import)
[INFO] All required model files are present in 'D:\Data\Packages\PyramidFlow2\Pyramid-Flow\pyramid_flow_model'. Skipping download.
C:\Users\gilda\AppData\Local\Programs\Python\Python310\lib\site-packages\gradio\helpers.py:147: UserWarning: In future versions of Gradio, the `cache_examples` parameter will no longer accept a value of 'lazy'. To enable lazy caching in Gradio, you should set `cache_examples=True`, and `cache_mode='lazy'` instead.
  warnings.warn(
Will cache examples in 'D:\Data\Packages\PyramidFlow2\Pyramid-Flow\.gradio\cached_examples\16' directory at first use.

Will cache examples in 'D:\Data\Packages\PyramidFlow2\Pyramid-Flow\.gradio\cached_examples\28' directory at first use.

Information : impossible de trouver des fichiers pour le(s) modèle(s) spécifié(s).
* Running on local URL:  http://127.0.0.1:7860

Could not create share link. Please check your internet connection or our status page: https://status.gradio.app.
[DEBUG] generate_text_to_video called.
[INFO] Initializing model with variant='768p', using bf16 precision...
[DEBUG] Model base path: D:\Data\Packages\PyramidFlow2\Pyramid-Flow\pyramid_flow_model
Using temporal causal attention
You set `add_prefix_space`. The tokenizer needs to be converted from the slow tokenizers
Loading checkpoint shards: 100%|█████████████████████████████████████████████████████████| 2/2 [00:33<00:00, 16.68s/it]
An error occurred while trying to fetch D:\Data\Packages\PyramidFlow2\Pyramid-Flow\pyramid_flow_model\causal_video_vae: Error no file named diffusion_pytorch_model.safetensors found in directory D:\Data\Packages\PyramidFlow2\Pyramid-Flow\pyramid_flow_model\causal_video_vae.
Defaulting to unsafe serialization. Pass `allow_pickle=False` to raise an error instead.
The latent dimmension channes is 16
The start sigmas and end sigmas of each stage is Start: {0: 1.0, 1: 0.8002399489209289, 2: 0.5007496155411024}, End: {0: 0.6669999957084656, 1: 0.33399999141693115, 2: 0.0}, Ori_start: {0: 1.0, 1: 0.6669999957084656, 2: 0.33399999141693115}
[INFO] Model initialized successfully.
[INFO] Starting text-to-video generation...
  0%|                                                                                           | 0/16 [00:00<?, ?it/s]D:\Data\Packages\PyramidFlow2\Pyramid-Flow\pyramid_dit\flux_modules\modeling_flux_block.py:363: UserWarning: 1Torch was not compiled with flash attention. (Triggered internally at C:\actions-runner\_work\pytorch\pytorch\builder\windows\pytorch\aten\src\ATen\native\transformers\cuda\sdp_utils.cpp:555.)
  stage_hidden_states = F.scaled_dot_product_attention(
100%|█████████████████████████████████████████████████████████████████████████████| 16/16 [24:55:09<00:00, 5606.84s/it]
[INFO] Text-to-video generation completed.
[INFO] Video exported to 074786d5-c3cd-470c-815c-eedb77002064_text_to_video_sample.mp4.
Exception in callback _ProactorBasePipeTransport._call_connection_lost(None)
handle: <Handle _ProactorBasePipeTransport._call_connection_lost(None)>
Traceback (most recent call last):
  File "C:\Users\gilda\AppData\Local\Programs\Python\Python310\lib\asyncio\events.py", line 80, in _run
    self._context.run(self._callback, *self._args)
  File "C:\Users\gilda\AppData\Local\Programs\Python\Python310\lib\asyncio\proactor_events.py", line 162, in _call_connection_lost
    self._sock.shutdown(socket.SHUT_RDWR)
ConnectionResetError: [WinError 10054] Une connexion existante a dû être fermée par l’hôte distant

thanks you !

Processeur 11th Gen Intel(R) Core(TM) i5-11300H @ 3.10GHz 3.11 GHz Mémoire RAM installée 20,0 Go (19,8 Go utilisable) Édition Windows 11 Famille Version 23H2 Installé le ‎01/‎12/‎2022 Build du système d’exploitation 22631.4460 Expérience Windows Feature Experience Pack 1000.22700.1047.0 Nvidia geforce RTX3050 ti laptop gpu (4gb)

mkultra333 commented 6 days ago

The duration timer isn't time in seconds, but some kind of sub unit used in the algorithm. 4 is about 1 second, 7 is about 2 seconds, and 15-16 is about 5 seconds.

Here's what I've found regarding Duration setting versus how many frames you get. IIRC default Pyramid Flow video is 25 frames per second.

Duration Frames 2 9 3 17 4 25 5 33 6 41 7 49 8 57 9 65 10 73 11 81 12 89 13 97 14 105 15 113 16 121

Giribot commented 6 days ago

Thank you very much, it was not very clear. No possibility to adjust the duration of the video beyond 5s (10s, 15s, more? (even if we have ghosts, it is not serious and it is even funny (there is a ghost at the end of the video that my pc calculated at the end!)) (Booo ! 👻👻👻))

mkultra333 commented 6 days ago

384p video goes up to 5s and 768p video goes up to 10s. I only have an rtx 3070 and when I tried to do 768p it was way too slow so I stick to 384p. I don't know if 5s is a hard limit or not, I haven't tried changing it in the code.

If you're using a weaker card you might want to generate 384p video and then use an upscaler like video2x to increase the resolution afterward.

https://github.com/k4yt3x/video2x

Giribot commented 6 days ago

Maybe a bug in the script of gradio ?

Quote: "2024.11.13 🚀🚀🚀 We release the 768p miniFLUX checkpoint (up to 10s).

We have switched the model structure from SD3 to a mini FLUX to fix human structure issues, please try our 1024p image checkpoint, 384p video checkpoint (up to 5s) and 768p video checkpoint (up to 10s). The new miniflux model shows great improvement on human structure and motion stability....

It looks like a bug in gradio that forgets to offer this option when choosing the LLM in 768p? ===> «768p video checkpoint (up to 10s)»

I don't know.

But a lot of thanks for this marvelous LLM ! ❤️❤️❤️❤️❤️❤️❤️❤️❤️❤️❤️❤️❤️❤️❤️❤️

mkultra333 commented 6 days ago

Maybe. I'm not one of the coders, I'm just another user like you, so I don't know.

Giribot commented 4 days ago

Maybe Solved by an update of app.py

https://github.com/jy0205/Pyramid-Flow/issues/187