[attention] Investigate overlapping matmul and softmax

nod-ai / sdxl-scripts

Apache License 2.0

2 stars 5 forks source link

[attention] Investigate overlapping matmul and softmax #91

Open antiagainst opened 3 months ago

antiagainst commented 3 months ago

In Flash Attention 3 we see a technique to overlap matmul and softmax from different waves to maximize mfma utilization. We should consider how to use it for current attention. Need to understand hardware scheduler and see how to work with/around it, like using s_setprio instructions.