support advanced attention implementations (FA3, FlashInfer, xformers, etc.)

xdit-project / xDiT

xDiT: A Scalable Inference Engine for Diffusion Transformers (DiTs) with Massive Parallelism

Apache License 2.0

720 stars 55 forks source link

support advanced attention implementations (FA3, FlashInfer, xformers, etc.) #319

Open feifeibear opened 4 weeks ago

feifeibear commented 4 weeks ago

The xdit with parallel degree>2 will execute the attention in the following line in unified SP. The exactly position to execute attention is in the ring_attention's implementation. Because USP applies the Ulysses outside of the Ring.

https://github.com/xdit-project/xDiT/blob/f9e35f71f98726f4e923deb3fde4accc548eed46/xfuser/core/long_ctx_attention/ring/ring_flash_attn.py#L75

antferdom commented 2 weeks ago

Tritonbench: [performance] Torch SDPA cuDNN backend vs FlashAttention v3 #41