Open Grey4sh opened 3 months ago
Having a similar issue where TGI doesn't seem to use the custom model mappings from the config.json
even if one is present and falls back to AutoModel
Hi @Cucunnber 👋
Thanks for reporting this. I think we don't have bandwidth to jump on this directly by I'll now tag @danieldk since he's the marlin & GPTQ expert.
System Info
A100-80GB * 4
Information
Tasks
Reproduction
Expected behavior
Describe the bug
I get the error Cannot load
gptq
weight for GPTQ -> Marlin repacking, make sure the model is already quantized when i inference gptq quantized model DeepSeekCoderV2 with Text-generation-inference 2.2.0.config.json
quantize_config.json
error log
Additional context
I don't have inference problem with
GPTQMode.from_quantized()