[attention] Make fp8 attention performant

nod-ai / SHARK-ModelDev

Unified compiler/runtime for interfacing with PyTorch Dynamo.

Apache License 2.0

95 stars 48 forks source link

Open antiagainst opened 3 months ago

antiagainst commented 3 months ago

fp8 attention is expected to deliver performance gains comparing to fp16 one. Figure out the blocking issues and fix them.