I followed the steps try to get 4bit version of llama7b by using command python -m llama.llama_quant decapoda-research/llama-7b-hf c4 --wbits 4 --groupsize 128 --save pyllama-7B4b.pt, the script works well, but at the evaluating stage, it got a very large number 251086.96875.
And when I testing with the quantized .pt file, model returns un-readable results.
I followed the steps try to get 4bit version of llama7b by using command
python -m llama.llama_quant decapoda-research/llama-7b-hf c4 --wbits 4 --groupsize 128 --save pyllama-7B4b.pt
, the script works well, but at the evaluating stage, it got a very large number 251086.96875.And when I testing with the quantized .pt file, model returns un-readable results.
Anyone has same problem?