Closed flozi00 closed 1 month ago
Mistral + eetq tested and working
llama tested too
Benchmark vs Main branch: {"input_tokens_per_second": 14643, "output_tokens_per_second": 218} -- main {"input_tokens_per_second": 15003, "output_tokens_per_second": 236} -- This PR
awq tested Sharding tested
What does this PR do?
@tgaddair its just for you, tracking progress now, please do not merge at the moment
This PR also introduces FP8 Linear and fp8 kv cache by vllm
Fixes # (issue)
Before submitting
Who can review?
Anyone in the community is free to review the PR once the tests have passed. Feel free to tag members/contributors who may be interested in your PR.