Closed birshert closed 2 months ago
Hey @birshert, I confirm I get gibberish as well with the AWQ implem. Is it possible for you to switch to the non-AWQ version while we fix it?
cc @danieldk maybe? :)
@LysandreJik yeah, sure. Already downloaded gptq-4bit. Thanks for fast answer! Love your work <3
Thanks for reporting this! We were not correctly adding the bias (in the attention layer) when AWQ is used, #2117 should fix this.
System Info
https://github.com/huggingface/text-generation-inference/pull/1584#issuecomment-2185948541
Hello everyone! Tried using qwen2 72b through docker 2.0.4 version and it fails to write anything meaningfull:
Information
Tasks
Reproduction
I have a PC with two rtx4090.
Expected behavior
I want qwen2 to act like a normal llm.