Open htcml opened 1 year ago
Replace "decapoda-research/llama-7b-hf" with the path of a hf model. Maybe you need to convert it first.
Use the following to convert:
python3 -m llama.convert_llama --ckpt_dir $CKPT_DIR --tokenizer_path $TOKENIZER_PATH --model_size 7B --output_dir ./converted_meta --to hf --max_batch_size 4
Note that it creates a weird directory structure as I had issues with locating a tokenizer during quantization. So after conversion I renamed llama-7b
to 7B
and made:
cp -rf ./converted_meta/tokenizer/* ./converted_meta/7B/
Then run:
python3 -m llama.llama_quant ./converted_meta/7B c4 --wbits 4 --groupsize 128 --save pyllama-7B4b.pt
using the recommended convert with downloaded checkpoints fail on TypeError: 'NoneType' object is not subscriptable. any idea?
(venv) oferk@ironman:~/git/pyllama$ python3 -m llama.convert_llama --ckpt_dir pyllama_data/ --tokenizer_path pyllama_data/tokenizer.model --model_size 7B --output_dir ./converted_meta --to hf --max_batch_size 4
Traceback (most recent call last):
File "/usr/lib/python3.10/runpy.py", line 196, in _run_module_as_main
return _run_code(code, main_globals, None,
File "/usr/lib/python3.10/runpy.py", line 86, in _run_code
exec(code, run_globals)
File "/home/oferk/git/pyllama/venv/lib/python3.10/site-packages/llama/convert_llama.py", line 377, in
downloaded folder structure
A bit confused here. In README.md, users are asked to donwload LLaMA model files first. Then quantize examples use decapoda-research/llama-7b-hf. How to quantize the downloaded LLaMA model files(for example, consolidated.00.pth for 7B)?
python -m llama.llama_quant decapoda-research/llama-7b-hf c4 --wbits 4 --groupsize 128 --save pyllama-7B4b.pt