nod-ai / SHARK-ModelDev

Unified compiler/runtime for interfacing with PyTorch Dynamo.
Apache License 2.0
95 stars 48 forks source link

[attention] Make fp8 attention performant #802

Open antiagainst opened 3 months ago

antiagainst commented 3 months ago

fp8 attention is expected to deliver performance gains comparing to fp16 one. Figure out the blocking issues and fix them.