Closed the-crypt-keeper closed 2 months ago
Fairly easy to get going and Llama-7b works, but there's no Llama-Chat-* quants so I can't eval. Wasn't able to get the Mixtral-Instruct quant working. Revisit this in a few weeks..
AQLM is working with latest vLLM but performance is hillariously bad, on A100-40G I see 7 tok/sec on ISTA-DASLab/Meta-Llama-3-70B-Instruct-AQLM-2Bit-1x16
https://github.com/Vahe1994/AQLM