Help: Quantized llama-7b model with custom prompt format produces only gibberish

Could someone help me with how to quantize my own model with GPTQ-for-LLaMA? See screenshot of the output I am getting :cry:

Original full model: https://huggingface.co/Glavin001/startup-interviews-13b-int4-2epochs-1 Working quantized model with AutoGPT (screenshots): https://huggingface.co/Glavin001/startup-interviews-13b-2epochs-4bit-2 Dataset: https://huggingface.co/datasets/Glavin001/startup-interviews Command I used in attempt to quantize: https://github.com/qwopqwop200/GPTQ-for-LLaMa

CUDA_VISIBLE_DEVICES=0 python3 llama.py /workspace/text-generation-webui/models/Glavin001_startup-interviews-13b-int4-2epochs-1/ c4 --wbits 4 --true-sequential --act-order --groupsize 128 --save_safetensors startup-interviews-llama7b-4bit-128g.safetensors

Quantized model (screenshots): Glavin001/startup-interviews-llama7b-v0.1-GPTQ ( https://huggingface.co/Glavin001/startup-interviews-llama7b-v0.1-GPTQ/tree/main ) Tested with/you can reproduce: TheBloke's Runpod template: https://github.com/TheBlokeAI/dockerLLM/ Model loader: Both AutoGPT & ExLlama look like gibberish/garbage output. Example prompt:

<|prompt|>What is a MVP?</s><|answer|>

Possible problems: I'm still learning about quantization. I notice there is a dataset field, set to c4 dataset. The dataset and prompt style for this model is different. I'm not sure how to customize this though, maybe I need custom Python script instead of using the llama.py CLI?

It took an hour or so to generate this so I'd like to get it right next time :joy:

Any advice would be greatly appreciated! Thanks in advance!

Broken: GPTQ-for-LLaMa	Working: AutoGPTQ

qwopqwop200 / GPTQ-for-LLaMa

Help: Quantized llama-7b model with custom prompt format produces only gibberish #276