thu-nics / DiTFastAttn

MIT License
44 stars 5 forks source link

no speed up #3

Closed seeyourcell closed 3 weeks ago

seeyourcell commented 1 month ago

I test opensora demo no speed up

100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 100/100 [00:30<00:00, 3.30it/s] Prompt: A vibrant underwater scene. A group of blue fish, with yellow fins, are swimming around a coral reef. The coral reef is a mix of brown and green, providing a natural habitat for the fish. The water is a deep blue, indicating a depth of around 30 feet. The fish are swimming in a circular pattern around the coral reef, indicating a sense of motion and activity. The overall scene is a beautiful representation of marine life.

from --threshold 0 to --threshold 0.15

hahnyuan commented 1 month ago

Thank you for sharing the details about your test of the DiTFastAttn demo on the OpenSoRA platform. I appreciate you providing the specific prompt and context around the test scenario.

Based on the information you've shared, it seems that the speed-up techniques introduced in the DiTFastAttn paper may not have a significant impact on the performance for a relatively low-resolution video generation task, such as the 240p demo you tested.

The key reasons for this are: Computational complexity of self-attention: The paper highlights that the quadratic computational complexity of the self-attention mechanism in Diffusion Transformer (DiT) models is the primary bottleneck, especially for high-resolution and long video tasks. For a lower-resolution input like 240p, the overall computational load of the self-attention operation may not be large enough to warrant significant acceleration.

Given these considerations, it's understandable that you did not observe a significant speed-up in the 240p video generation scenario. The DiTFastAttn techniques are likely to be more impactful for larger and more computationally intensive tasks (such as 1080p video generation), where the self-attention bottleneck becomes more pronounced.

However, we have not experiment on larger resolution because we have only 40G A100 now. We will continue to test in this scenario.

Please let me know if you have any other questions or if there's anything else I can assist you with.