turboderp / exllama

A more memory-efficient rewrite of the HF transformers implementation of Llama for use with quantized weights.
MIT License
2.74k stars 215 forks source link

Support for Baichuan2 models #280

Open bernardx opened 1 year ago

bernardx commented 1 year ago

Baichuan2 is a new model tha has better overall results compared to the llama. https://huggingface.co/baichuan-inc/Baichuan2-13B-Chat It works with AutoGPTQ, but I encountered some error with exllama.

3997267f0ea6fc9c7992296e90da7bf5

wut0n9 commented 11 months ago

@bernardx 解决了吗