Closed zhyncs closed 1 month ago
Is there a specific timeline for this?
Is there a specific timeline for this?
bmm fp8 has been implemented with https://github.com/flashinfer-ai/flashinfer/pull/469 fp8 e5m2 kv cache has been implemented with https://github.com/sgl-project/sglang/pull/1204
Currently, there is no adaptation for DeepSeek V2 as we are focusing on other higher priority tasks. Expected to be completed within these few days.
done
Checklist
Motivation
As titled. Make DeepSeek V2 MLA Faster!
Related resources
No response