[Feature] support W8A8(FP8) and KV Cache FP8 for DeepSeek V2

sgl-project / sglang

SGLang is a fast serving framework for large language models and vision language models.

https://sglang.readthedocs.io/en/latest/

Apache License 2.0

5.53k stars 412 forks source link

Closed zhyncs closed 1 month ago

zhyncs commented 1 month ago

[X] 1. If the issue you raised is not a feature but a question, please raise a discussion at https://github.com/sgl-project/sglang/discussions/new/choose Otherwise, it will be closed.
[X] 2. Please use English, otherwise it will be closed.

As titled. Make DeepSeek V2 MLA Faster!

No response

fengyang95 commented 1 month ago

Is there a specific timeline for this?

zhyncs commented 1 month ago

Is there a specific timeline for this?

Currently, there is no adaptation for DeepSeek V2 as we are focusing on other higher priority tasks. Expected to be completed within these few days.

zhyncs commented 1 month ago

done