[NeurIPS 2023] LLM-Pruner: On the Structural Pruning of Large Language Models. Support Llama-3/3.1, Llama-2, LLaMA, BLOOM, Vicuna, Baichuan, TinyLlama, etc.
[START] - Start Pruning Model
Traceback (most recent call last):
File "/home/iotsc01/xinpengq/LLM-Pruner-main/hf_prune.py", line 314, in
main(args)
File "/home/iotsc01/xinpengq/LLM-Pruner-main/hf_prune.py", line 39, in main
tokenizer = LlamaTokenizer.from_pretrained(args.base_model)
File "/home/iotsc01/anaconda3/envs/xinpengq_env/lib/python3.10/site-packages/transformers/tokenization_utils_base.py", line 2012, in from_pretrained
raise EnvironmentError(
OSError: Can't load tokenizer for 'baffo32/decapoda-research-llama-7B-hf'. If you were trying to load it from 'https://huggingface.co/models', make sure you don't have a local directory with the same name. Otherwise, make sure 'baffo32/decapoda-research-llama-7B-hf' is the correct path to a directory containing all relevant files for a LlamaTokenizer tokenizer.
[FINISH] - Finish Pruning Model
[START] - Start Tuning
Traceback (most recent call last):
File "/home/iotsc01/xinpengq/LLM-Pruner-main/post_training.py", line 262, in
main(args)
File "/home/iotsc01/xinpengq/LLM-Pruner-main/post_training.py", line 33, in main
pruned_dict = torch.load(args.prune_model, map_location='cpu')
File "/home/iotsc01/anaconda3/envs/xinpengq_env/lib/python3.10/site-packages/torch/serialization.py", line 986, in load
with _open_file_like(f, 'rb') as opened_file:
File "/home/iotsc01/anaconda3/envs/xinpengq_env/lib/python3.10/site-packages/torch/serialization.py", line 435, in _open_file_like
return _open_file(name_or_buffer, mode)
File "/home/iotsc01/anaconda3/envs/xinpengq_env/lib/python3.10/site-packages/torch/serialization.py", line 416, in init
super().init(open(name, mode))
FileNotFoundError: [Errno 2] No such file or directory: 'prune_log/llama_prune/pytorch_model.bin'
[FINISH] - Finish Prune and Post-Training.
[INFO] - The pruned model is at {prune_log/llama_prune/pytorch_model.bin}, and the recovery weight is at {tune_log/llama_0.2}/
You can use the command:
python generate.py --model_type tune_prune_LLM --ckpt prune_log/llama_prune/pytorch_model.bin --lora_ckpt tune_log/llama_0.2
to use the pruned model
bash scripts/llama_prune.sh
[START] - Start Pruning Model Traceback (most recent call last): File "/home/iotsc01/xinpengq/LLM-Pruner-main/hf_prune.py", line 314, in
main(args)
File "/home/iotsc01/xinpengq/LLM-Pruner-main/hf_prune.py", line 39, in main
tokenizer = LlamaTokenizer.from_pretrained(args.base_model)
File "/home/iotsc01/anaconda3/envs/xinpengq_env/lib/python3.10/site-packages/transformers/tokenization_utils_base.py", line 2012, in from_pretrained
raise EnvironmentError(
OSError: Can't load tokenizer for 'baffo32/decapoda-research-llama-7B-hf'. If you were trying to load it from 'https://huggingface.co/models', make sure you don't have a local directory with the same name. Otherwise, make sure 'baffo32/decapoda-research-llama-7B-hf' is the correct path to a directory containing all relevant files for a LlamaTokenizer tokenizer.
[FINISH] - Finish Pruning Model
[START] - Start Tuning
Traceback (most recent call last):
File "/home/iotsc01/xinpengq/LLM-Pruner-main/post_training.py", line 262, in
main(args)
File "/home/iotsc01/xinpengq/LLM-Pruner-main/post_training.py", line 33, in main
pruned_dict = torch.load(args.prune_model, map_location='cpu')
File "/home/iotsc01/anaconda3/envs/xinpengq_env/lib/python3.10/site-packages/torch/serialization.py", line 986, in load
with _open_file_like(f, 'rb') as opened_file:
File "/home/iotsc01/anaconda3/envs/xinpengq_env/lib/python3.10/site-packages/torch/serialization.py", line 435, in _open_file_like
return _open_file(name_or_buffer, mode)
File "/home/iotsc01/anaconda3/envs/xinpengq_env/lib/python3.10/site-packages/torch/serialization.py", line 416, in init
super().init(open(name, mode))
FileNotFoundError: [Errno 2] No such file or directory: 'prune_log/llama_prune/pytorch_model.bin'
[FINISH] - Finish Prune and Post-Training.
[INFO] - The pruned model is at {prune_log/llama_prune/pytorch_model.bin}, and the recovery weight is at {tune_log/llama_0.2}/
You can use the command:
python generate.py --model_type tune_prune_LLM --ckpt prune_log/llama_prune/pytorch_model.bin --lora_ckpt tune_log/llama_0.2
to use the pruned model