horseee / LLM-Pruner

[NeurIPS 2023] LLM-Pruner: On the Structural Pruning of Large Language Models. Support Llama-3/3.1, Llama-2, LLaMA, BLOOM, Vicuna, Baichuan, TinyLlama, etc.
Apache License 2.0
828 stars 92 forks source link

my process have some problems #6

Closed 18140663659 closed 1 year ago

18140663659 commented 1 year ago
  1. download the vicuna model from this: vicuna model
  2. because of network problem, i download the book corpus.tar.bz2 and uncompress it:

and change the get_bookcorpus api:

  1. run the pruner process and get the log:

python --pruning_ratio 0.1 \ --block_wise \ --block_mlp_layer_start 4 --block_mlp_layer_end 30 \ --block_attention_layer_start 4 --block_attention_layer_end 30 \ --pruner_type Taylor \ --test_after_train \ --device cpu --eval_device cuda \ --save_ckpt_log_name llama_prune \ --base_model vicuna_model \ --save_model

Loading checkpoint shards: 100%|██████████████████████████████████████████████████████████████████████████████████| 2/2 [00:12<00:00, 6.30s/it] 2023-06-27 14:47:57 - INFO : Use taylor pruner... 2023-06-27 14:47:57 - INFO : Pruning Attention Layer = [4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29] 2023-06-27 14:47:57 - INFO : Pruning MLP Layer = [4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29] 2023-06-27 14:47:58 - INFO : Start Pruning 2023-06-27 14:47:59 - WARNING : Found cached dataset text (/root/.cache/huggingface/datasets/text/default-a5b37c6d93890eb9/0.0.0/cb1e9bd71a82ad27976be3b12b407850fe2837d80c22c5e03a28949843a8ace2) 2023-06-27 14:48:00 - INFO : Start Backwarding in iterative steps = 0... 2023-06-27 14:48:08 - INFO : Loss = 3.7414731979370117 2023-06-27 14:48:34 - INFO : After Iter 1/1, #parameters: 6223081472 2023-06-27 14:48:34 - INFO : #Param before: 6738415616, #Param after: 6223081472, Ratio = 92.3523% 保存模型 prune_log/llama_prune/pytorch_model.bin 2023-06-27 14:51:02 - INFO :
==================Generation Results After Pruning================

2023-06-27 14:51:04 - INFO : I believe the meaning of life is to continue seeking for self-fulfillment while also contributing to the betterment of the world and society. It's not to be selfish or materialistic, but to find balance and happiness through meaningful work, relationships, and experiences. 2023-06-27 14:51:08 - INFO : Simply put, the theory of relativity states that 1. Gravity is not a force in the sense that we usually understand it, but rather the result of the curvature of spacetime, caused by the presence of massive objects. This curvature creates a field that can bend the path of light. 2. The speed of light is always constant in a vacuum, regardless of the motion of the observer or the source. As a result, the amount of time it takes for light to travel a certain distance is dependent on the observer’s relative motion. 3. Time and space are intertw 2023-06-27 14:51:13 - INFO : Building a website can be done in 10 simple steps:

  1. Choose a domain name that is unique and easy to remember.
  2. Choose a hosting service that will support your website.
  3. Choose a web host and register your domain.
  4. Set up your website using a web hosting service.
  5. Customize the look and feel of your website.
  6. Add content to your website.
  7. Test your website to ensure that everything is working properly.
  8. Launch your website to the public.
  9. Continuously improve your website and 2023-06-27 14:51:14 - INFO : Tweet: "I hate it when my phone battery dies." Sentiment: Negative

    Tweet: "My day has been 👍" Sentiment: Positive

    Tweet: "This is the link to the article" Sentiment: Neutral

    Tweet: "This new music video was incredibile" Sentiment: Positive

    Tweet: "I just spent the whole day cleaning my room" Sentiment: Negative

    Note: The sentiment refers 2023-06-27 14:51:18 - INFO : Translate English to French:

sea otter => loutre de mer

peppermint => menthe poivrée

plush girafe => girafe peluche

cheese => fromage

English to French:

honey badger => renard des merveilles

spaghetti squirrel => écureuil à la spaghettini

poutine rainbow → rainbow de poutine

French translation of “the cat sat on the mat”: Le chat est assis sur le tapis. 2023-06-27 14:51:18 - INFO :

2023-06-27 14:51:18 - WARNING : Found cached dataset wikitext (/root/.cache/huggingface/datasets/wikitext/wikitext-2-raw-v1/1.0.0/a241db52902eaf2c6aa732210bead40c090019a499ceb13bcbfa3f8ab646a126) 2023-06-27 14:51:18 - WARNING : Found cached dataset wikitext (/root/.cache/huggingface/datasets/wikitext/wikitext-2-raw-v1/1.0.0/a241db52902eaf2c6aa732210bead40c090019a499ceb13bcbfa3f8ab646a126) Token indices sequence length is longer than the specified maximum sequence length for this model (341469 > 2048). Running this sequence through the model will result in indexing errors 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████| 667/667 [00:58<00:00, 11.48it/s] {'wikitext2': 19.091033031037714} 2023-06-27 14:52:18 - WARNING : Found cached dataset wikitext (/root/.cache/huggingface/datasets/wikitext/wikitext-2-raw-v1/1.0.0/a241db52902eaf2c6aa732210bead40c090019a499ceb13bcbfa3f8ab646a126) 2023-06-27 14:52:18 - WARNING : Found cached dataset wikitext (/root/.cache/huggingface/datasets/wikitext/wikitext-2-raw-v1/1.0.0/a241db52902eaf2c6aa732210bead40c090019a499ceb13bcbfa3f8ab646a126) 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████| 667/667 [00:58<00:00, 11.49it/s] {'wikitext2': 19.091033031037714, 'ptb': 19.091033031037714} 2023-06-27 14:53:18 - INFO : PPL after pruning: {'wikitext2': 19.091033031037714, 'ptb': 19.091033031037714} 2023-06-27 14:53:18 - INFO : Memory Requirement: 12052.52490234375 MiB

  1. inference the pruner model and get the result:

    commend: python --model_type pruneLLM --ckpt pruner_model/pytorch_model_10%.bin

    inference code: def main(args): if args.model_type == 'pretrain':

    tokenizer = LlamaTokenizer.from_pretrained(args.base_model,cache_dir='llama_hf_model',force_download=True)

    model = LlamaForCausalLM.from_pretrained(


    low_cpu_mem_usage=True if torch_version >=9 else False,cache_dir='llama_hf_model',force_download=True


    tokenizer = LlamaTokenizer.from_pretrained(args.base_model)
    model = LlamaForCausalLM.from_pretrained(
        low_cpu_mem_usage=True if torch_version >=9 else False,cache_dir='llama_hf_model'
    description = "Model Type: {}\n Base Model: {}".format(args.model_type, args.base_model)

    elif args.model_type == 'pruneLLM': pruned_dict = torch.load(args.ckpt, map_location='cpu') tokenizer, model = pruned_dict['tokenizer'], pruned_dict['model'] description = "Model Type: {}\n Pruned Model: {}".format(args.model_type, args.ckpt) elif args.model_type == 'tune_prune_LLM': pruned_dict = torch.load(args.ckpt, map_location='cpu') tokenizer, model = pruned_dict['tokenizer'], pruned_dict['model'] model = PeftModel.from_pretrained( model, args.lora_ckpt, torch_dtype=torch.float16, ) description = "Model Type: {}\n Pruned Model: {}\n LORA ckpt: {}".format(args.model_type, args.ckpt, args.lora_ckpt) else: raise NotImplementedError

    if device == "cuda": model.half() model = model.cuda()

    unwind broken decapoda-research config

    model.config.pad_token_id = tokenizer.pad_token_id = 0 # unk model.config.bos_token_id = 1 model.config.eos_token_id = 2


    print("Human:") line = input() while line: inputs = tokenizer(line, return_tensors="pt") input_ids = inputs["input_ids"].to(device)

    with torch.no_grad():
        generation_output = model.generate(



    s = generation_output.sequences[0]
    output = tokenizer.decode(s)
    print("Human:") #每次终端用户输入前,加上Human提示。
    line = input()

    result: Human: 写一首春天的诗句 写一首春天的诗句: 烟雾拢眉头风呈起, 吃掉大地茁��cies, 伞拧开远眼, 春日裡嗟衟啦。

虽然这句诗语优雅, 语言优美且清新, 但哦,听起来好聪明。

谁是说这句话? 又有什么寓意?

Human: 心情不好的时候应该如何调整 心情不好的时候应该如何调整自己的思维?

Human: 怎样学习机器学习 怎样学习机器学习,让生慢慢领略智慧






horseee commented 1 year ago

Hi. Can you explain more about what the problem is?