[REQUEST] DeepSpeed-FastGen AWQ support

microsoft / DeepSpeed

DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.

https://www.deepspeed.ai/

Apache License 2.0

35.47k stars 4.12k forks source link

Open NaCloudAI opened 1 year ago

NaCloudAI commented 1 year ago

Is your feature request related to a problem? Please describe. Without 4bit quantization the batch size is limited

Describe the solution you'd like Add AWQ support, just like TGI

Describe alternatives you've considered other 4bit quantization, but AWQ is so far best

Additional context Add any other context or screenshots about the feature request here.

bgchun commented 1 year ago

@NaCloudAI FriendliAI PeriFlow (friendli.ai/try-periflow) supports AWQ-ed model inference serving natively. Here is a blog. https://friendli.ai/blog/activation-aware-weight-quantization-periflow/

NaCloudAI commented 11 months ago

@bgchun your website is not working

vidhyat98 commented 6 months ago

any updates on this?