[sp] : fix the attention kernel for sp

hpcaitech / ColossalAI

Making large AI models cheaper, faster and more accessible

https://www.colossalai.org

Apache License 2.0

38.82k stars 4.35k forks source link

[sp] : fix the attention kernel for sp #6061

Closed wangbluo closed 2 months ago

wangbluo commented 2 months ago

📝 What does this PR do?

For cases where s_q s_kv element size >= 10 GB, dispatch only to the FlashAttentionDaoLoader kernel, and use an empty tensor as a placeholder for the attention_mask. Additionally, only causal and padded causal modes are supported.