Open feifeibear opened 4 weeks ago
The xdit with parallel degree>2 will execute the attention in the following line in unified SP. The exactly position to execute attention is in the ring_attention's implementation. Because USP applies the Ulysses outside of the Ring.
https://github.com/xdit-project/xDiT/blob/f9e35f71f98726f4e923deb3fde4accc548eed46/xfuser/core/long_ctx_attention/ring/ring_flash_attn.py#L75
Tritonbench: [performance] Torch SDPA cuDNN backend vs FlashAttention v3 #41
The xdit with parallel degree>2 will execute the attention in the following line in unified SP. The exactly position to execute attention is in the ring_attention's implementation. Because USP applies the Ulysses outside of the Ring.
https://github.com/xdit-project/xDiT/blob/f9e35f71f98726f4e923deb3fde4accc548eed46/xfuser/core/long_ctx_attention/ring/ring_flash_attn.py#L75