qwopqwop200 / GPTQ-for-LLaMa

4 bits quantization of LLaMA using GPTQ
Apache License 2.0
2.98k stars 457 forks source link

Help: Quantized llama-7b model with custom prompt format produces only gibberish #276

Open Glavin001 opened 1 year ago

Glavin001 commented 1 year ago

Could someone help me with how to quantize my own model with GPTQ-for-LLaMA? See screenshot of the output I am getting :cry:

Original full model: https://huggingface.co/Glavin001/startup-interviews-13b-int4-2epochs-1 Working quantized model with AutoGPT (screenshots): https://huggingface.co/Glavin001/startup-interviews-13b-2epochs-4bit-2 Dataset: https://huggingface.co/datasets/Glavin001/startup-interviews Command I used in attempt to quantize: https://github.com/qwopqwop200/GPTQ-for-LLaMa

CUDA_VISIBLE_DEVICES=0 python3 llama.py /workspace/text-generation-webui/models/Glavin001_startup-interviews-13b-int4-2epochs-1/ c4 --wbits 4 --true-sequential --act-order --groupsize 128 --save_safetensors startup-interviews-llama7b-4bit-128g.safetensors

Quantized model (screenshots): Glavin001/startup-interviews-llama7b-v0.1-GPTQ ( https://huggingface.co/Glavin001/startup-interviews-llama7b-v0.1-GPTQ/tree/main ) Tested with/you can reproduce: TheBloke's Runpod template: https://github.com/TheBlokeAI/dockerLLM/ Model loader: Both AutoGPT & ExLlama look like gibberish/garbage output. Example prompt:

<|prompt|>What is a MVP?</s><|answer|>

Possible problems: I'm still learning about quantization. I notice there is a dataset field, set to c4 dataset. The dataset and prompt style for this model is different. I'm not sure how to customize this though, maybe I need custom Python script instead of using the llama.py CLI?

It took an hour or so to generate this so I'd like to get it right next time :joy:

Any advice would be greatly appreciated! Thanks in advance!

Broken: GPTQ-for-LLaMa Working: AutoGPTQ
imageimage image image
CheshireAI commented 1 year ago

Try without --act-order.