Quantize Original LLaMA Model Files

juncongmoo / pyllama

LLaMA: Open and Efficient Foundation Language Models

GNU General Public License v3.0

2.81k stars 311 forks source link

Quantize Original LLaMA Model Files #60

Open htcml opened 1 year ago

htcml commented 1 year ago

A bit confused here. In README.md, users are asked to donwload LLaMA model files first. Then quantize examples use decapoda-research/llama-7b-hf. How to quantize the downloaded LLaMA model files(for example, consolidated.00.pth for 7B)?

python -m llama.llama_quant decapoda-research/llama-7b-hf c4 --wbits 4 --groupsize 128 --save pyllama-7B4b.pt

Imagium719 commented 1 year ago

Replace "decapoda-research/llama-7b-hf" with the path of a hf model. Maybe you need to convert it first.

sskorol commented 1 year ago

Use the following to convert:

python3 -m llama.convert_llama --ckpt_dir $CKPT_DIR --tokenizer_path $TOKENIZER_PATH --model_size 7B --output_dir ./converted_meta --to hf --max_batch_size 4

Note that it creates a weird directory structure as I had issues with locating a tokenizer during quantization. So after conversion I renamed llama-7b to 7B and made:

cp -rf ./converted_meta/tokenizer/* ./converted_meta/7B/

Then run:

python3 -m llama.llama_quant ./converted_meta/7B c4 --wbits 4 --groupsize 128 --save pyllama-7B4b.pt

kruzel commented 1 year ago

using the recommended convert with downloaded checkpoints fail on TypeError: 'NoneType' object is not subscriptable. any idea?

(venv) oferk@ironman:~/git/pyllama$ python3 -m llama.convert_llama --ckpt_dir pyllama_data/ --tokenizer_path pyllama_data/tokenizer.model --model_size 7B --output_dir ./converted_meta --to hf --max_batch_size 4

Traceback (most recent call last): File "/usr/lib/python3.10/runpy.py", line 196, in _run_module_as_main return _run_code(code, main_globals, None, File "/usr/lib/python3.10/runpy.py", line 86, in _run_code exec(code, run_globals) File "/home/oferk/git/pyllama/venv/lib/python3.10/site-packages/llama/convert_llama.py", line 377, in convert_llama_hf(args) File "/home/oferk/git/pyllama/venv/lib/python3.10/site-packages/llama/convert_llama.py", line 340, in convert_llama_hf write_model( File "/home/oferk/git/pyllama/venv/lib/python3.10/site-packages/llama/convert_llama.py", line 64, in write_model n_layers = params["n_layers"] TypeError: 'NoneType' object is not subscriptable

downloaded folder structure