Open paolovic opened 8 months ago
I am encountering this issue with AutoGPTQ and Mixtral as well. I am seeing a similar error with AutoAWQ and Mixtral
ValueError: OC is not multiple of cta_N = 64
I am also facing with the same issue.. Any progress?
It seems like if you use AutoGPTQ/AutoAWQ directly you can get something working.
model = AutoGPTQForCausalLM.from_quantized(model_path, device="cuda:0")
model = AutoAWQForCausalLM.from_quantized(model_path)
thank you @hyaticua , will give this a try
Hi, can you provide a minimal code to reproduce this issue ? and link to the original issue in AutoGPTQ
Hi @IlyasMoutawwakil , https://github.com/AutoGPTQ/AutoGPTQ/issues/486 There is also a code snippet provided. I am almost certain using AutoGPTQForCausalLM will solve my problem, as soon as I have some time, I will provide a snippet myself.
@IlyasMoutawwakil @hyaticua @paolovic Any updates on this issue? I think its quite important for us to be able to load GPTQ models successfully using AutoModelForCausalLM.from_pretrained
.
System Info
Who can help?
No response
Information
Tasks
examples
folder (such as GLUE/SQuAD, ...)Reproduction (minimal, reproducible, runnable)
Hi, I am trying to deploy Mixtral-8x7B-Instruct-v0.1-GPTQ in 4bit precision with Ray.
Unfortunately, it keeps failing with the following error message:
The guys from AutoGPTQ say it's an issue with optimum....
Thank you in advance
Expected behavior
It is deployed without errors