Open NaCloudAI opened 1 year ago
@NaCloudAI FriendliAI PeriFlow (friendli.ai/try-periflow) supports AWQ-ed model inference serving natively. Here is a blog. https://friendli.ai/blog/activation-aware-weight-quantization-periflow/
@bgchun your website is not working
any updates on this?
Is your feature request related to a problem? Please describe. Without 4bit quantization the batch size is limited
Describe the solution you'd like Add AWQ support, just like TGI
Describe alternatives you've considered other 4bit quantization, but AWQ is so far best
Additional context Add any other context or screenshots about the feature request here.