sgl-project / sglang

SGLang is a fast serving framework for large language models and vision language models.
https://sglang.readthedocs.io/en/latest/
Apache License 2.0
5.53k stars 412 forks source link

[Feature] support W8A8(FP8) and KV Cache FP8 for DeepSeek V2 #1156

Closed zhyncs closed 1 month ago

zhyncs commented 1 month ago

Checklist

Motivation

As titled. Make DeepSeek V2 MLA Faster!

Related resources

No response

fengyang95 commented 1 month ago

Is there a specific timeline for this?

zhyncs commented 1 month ago

Is there a specific timeline for this?

bmm fp8 has been implemented with https://github.com/flashinfer-ai/flashinfer/pull/469 fp8 e5m2 kv cache has been implemented with https://github.com/sgl-project/sglang/pull/1204

Currently, there is no adaptation for DeepSeek V2 as we are focusing on other higher priority tasks. Expected to be completed within these few days.

zhyncs commented 1 month ago

done