neuralmagic / nm-vllm

A high-throughput and memory-efficient inference and serving engine for LLMs
https://nm-vllm.readthedocs.io
Other
251 stars 10 forks source link

[Usage]: Do you have any plans to support sparse fp8 kernel and support on rocm? #382

Closed DehuaTang closed 3 months ago

DehuaTang commented 3 months ago

Your current environment

The output of `python collect_env.py`

How would you like to use vllm

Great job ! Do you have any plans to support sparse fp8 kernel and support it on rocm?

mgoin commented 3 months ago

It isn't a plan at the moment but I could see us implementing it in the future if there is a use-case