horseee / LLM-Pruner

[NeurIPS 2023] LLM-Pruner: On the Structural Pruning of Large Language Models. Support Llama-3/3.1, Llama-2, LLaMA, BLOOM, Vicuna, Baichuan, TinyLlama, etc.
https://arxiv.org/abs/2305.11627
Apache License 2.0
828 stars 92 forks source link

my process have some problems #6

Closed 18140663659 closed 1 year ago

18140663659 commented 1 year ago
  1. download the vicuna model from this: vicuna model
  2. because of network problem, i download the book corpus.tar.bz2 and uncompress it:
image

and change the get_bookcorpus api:

image
  1. run the pruner process and get the log:

python hf_prune.py --pruning_ratio 0.1 \ --block_wise \ --block_mlp_layer_start 4 --block_mlp_layer_end 30 \ --block_attention_layer_start 4 --block_attention_layer_end 30 \ --pruner_type Taylor \ --test_after_train \ --device cpu --eval_device cuda \ --save_ckpt_log_name llama_prune \ --base_model vicuna_model \ --save_model

Loading checkpoint shards: 100%|██████████████████████████████████████████████████████████████████████████████████| 2/2 [00:12<00:00, 6.30s/it] 2023-06-27 14:47:57 - INFO : Use taylor pruner... 2023-06-27 14:47:57 - INFO : Pruning Attention Layer = [4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29] 2023-06-27 14:47:57 - INFO : Pruning MLP Layer = [4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29] 2023-06-27 14:47:58 - INFO : Start Pruning 2023-06-27 14:47:59 - WARNING : Found cached dataset text (/root/.cache/huggingface/datasets/text/default-a5b37c6d93890eb9/0.0.0/cb1e9bd71a82ad27976be3b12b407850fe2837d80c22c5e03a28949843a8ace2) 2023-06-27 14:48:00 - INFO : Start Backwarding in iterative steps = 0... 2023-06-27 14:48:08 - INFO : Loss = 3.7414731979370117 2023-06-27 14:48:34 - INFO : After Iter 1/1, #parameters: 6223081472 2023-06-27 14:48:34 - INFO : #Param before: 6738415616, #Param after: 6223081472, Ratio = 92.3523% 保存模型 prune_log/llama_prune/pytorch_model.bin 2023-06-27 14:51:02 - INFO :
==================Generation Results After Pruning================

2023-06-27 14:51:04 - INFO : I believe the meaning of life is to continue seeking for self-fulfillment while also contributing to the betterment of the world and society. It's not to be selfish or materialistic, but to find balance and happiness through meaningful work, relationships, and experiences. 2023-06-27 14:51:08 - INFO : Simply put, the theory of relativity states that 1. Gravity is not a force in the sense that we usually understand it, but rather the result of the curvature of spacetime, caused by the presence of massive objects. This curvature creates a field that can bend the path of light. 2. The speed of light is always constant in a vacuum, regardless of the motion of the observer or the source. As a result, the amount of time it takes for light to travel a certain distance is dependent on the observer’s relative motion. 3. Time and space are intertw 2023-06-27 14:51:13 - INFO : Building a website can be done in 10 simple steps:

  1. Choose a domain name that is unique and easy to remember.
  2. Choose a hosting service that will support your website.
  3. Choose a web host and register your domain.
  4. Set up your website using a web hosting service.
  5. Customize the look and feel of your website.
  6. Add content to your website.
  7. Test your website to ensure that everything is working properly.
  8. Launch your website to the public.
  9. Continuously improve your website and 2023-06-27 14:51:14 - INFO : Tweet: "I hate it when my phone battery dies." Sentiment: Negative

    Tweet: "My day has been 👍" Sentiment: Positive

    Tweet: "This is the link to the article" Sentiment: Neutral

    Tweet: "This new music video was incredibile" Sentiment: Positive

    Tweet: "I just spent the whole day cleaning my room" Sentiment: Negative

    Note: The sentiment refers 2023-06-27 14:51:18 - INFO : Translate English to French:

sea otter => loutre de mer

peppermint => menthe poivrée

plush girafe => girafe peluche

cheese => fromage

English to French:

honey badger => renard des merveilles

spaghetti squirrel => écureuil à la spaghettini

poutine rainbow → rainbow de poutine

French translation of “the cat sat on the mat”: Le chat est assis sur le tapis. 2023-06-27 14:51:18 - INFO :
==================Finish================

2023-06-27 14:51:18 - WARNING : Found cached dataset wikitext (/root/.cache/huggingface/datasets/wikitext/wikitext-2-raw-v1/1.0.0/a241db52902eaf2c6aa732210bead40c090019a499ceb13bcbfa3f8ab646a126) 2023-06-27 14:51:18 - WARNING : Found cached dataset wikitext (/root/.cache/huggingface/datasets/wikitext/wikitext-2-raw-v1/1.0.0/a241db52902eaf2c6aa732210bead40c090019a499ceb13bcbfa3f8ab646a126) Token indices sequence length is longer than the specified maximum sequence length for this model (341469 > 2048). Running this sequence through the model will result in indexing errors 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████| 667/667 [00:58<00:00, 11.48it/s] {'wikitext2': 19.091033031037714} 2023-06-27 14:52:18 - WARNING : Found cached dataset wikitext (/root/.cache/huggingface/datasets/wikitext/wikitext-2-raw-v1/1.0.0/a241db52902eaf2c6aa732210bead40c090019a499ceb13bcbfa3f8ab646a126) 2023-06-27 14:52:18 - WARNING : Found cached dataset wikitext (/root/.cache/huggingface/datasets/wikitext/wikitext-2-raw-v1/1.0.0/a241db52902eaf2c6aa732210bead40c090019a499ceb13bcbfa3f8ab646a126) 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████| 667/667 [00:58<00:00, 11.49it/s] {'wikitext2': 19.091033031037714, 'ptb': 19.091033031037714} 2023-06-27 14:53:18 - INFO : PPL after pruning: {'wikitext2': 19.091033031037714, 'ptb': 19.091033031037714} 2023-06-27 14:53:18 - INFO : Memory Requirement: 12052.52490234375 MiB

  1. inference the pruner model and get the result:

    commend: python generate.py --model_type pruneLLM --ckpt pruner_model/pytorch_model_10%.bin

    inference code: def main(args): if args.model_type == 'pretrain':

    tokenizer = LlamaTokenizer.from_pretrained(args.base_model,cache_dir='llama_hf_model',force_download=True)

    model = LlamaForCausalLM.from_pretrained(

    args.base_model,

    low_cpu_mem_usage=True if torch_version >=9 else False,cache_dir='llama_hf_model',force_download=True

    )

    tokenizer = LlamaTokenizer.from_pretrained(args.base_model)
    model = LlamaForCausalLM.from_pretrained(
        args.base_model,
        low_cpu_mem_usage=True if torch_version >=9 else False,cache_dir='llama_hf_model'
    )
    description = "Model Type: {}\n Base Model: {}".format(args.model_type, args.base_model)

    elif args.model_type == 'pruneLLM': pruned_dict = torch.load(args.ckpt, map_location='cpu') tokenizer, model = pruned_dict['tokenizer'], pruned_dict['model'] description = "Model Type: {}\n Pruned Model: {}".format(args.model_type, args.ckpt) elif args.model_type == 'tune_prune_LLM': pruned_dict = torch.load(args.ckpt, map_location='cpu') tokenizer, model = pruned_dict['tokenizer'], pruned_dict['model'] model = PeftModel.from_pretrained( model, args.lora_ckpt, torch_dtype=torch.float16, ) description = "Model Type: {}\n Pruned Model: {}\n LORA ckpt: {}".format(args.model_type, args.ckpt, args.lora_ckpt) else: raise NotImplementedError

    if device == "cuda": model.half() model = model.cuda()

    unwind broken decapoda-research config

    model.config.pad_token_id = tokenizer.pad_token_id = 0 # unk model.config.bos_token_id = 1 model.config.eos_token_id = 2

    model.eval()

    print("Human:") line = input() while line: inputs = tokenizer(line, return_tensors="pt") input_ids = inputs["input_ids"].to(device)

    with torch.no_grad():
        generation_output = model.generate(
            input_ids=input_ids,

    early_stopping=True,

    num_beams=4,

            do_sample=True,
            top_k=40,
            top_p=0.95,
            temperature=1,
            max_length=1024,
            return_dict_in_generate=True,
        )
    s = generation_output.sequences[0]
    output = tokenizer.decode(s)
    print(output)
    print("\n-------------------------------\n")
    print("Human:") #每次终端用户输入前,加上Human提示。
    line = input()

    result: Human: 写一首春天的诗句 写一首春天的诗句: 烟雾拢眉头风呈起, 吃掉大地茁��cies, 伞拧开远眼, 春日裡嗟衟啦。

虽然这句诗语优雅, 语言优美且清新, 但哦,听起来好聪明。

谁是说这句话? 又有什么寓意?


Human: 心情不好的时候应该如何调整 心情不好的时候应该如何调整自己的思维?


Human: 怎样学习机器学习 怎样学习机器学习,让生慢慢领略智慧

摘自《深度遗传算子》第1章:

众多高分层的训练,以各自的方式,对模型进行修饰,从而使其在应用中发挥更好的作用。这就是逼真模型的目的。

很有趣的是,不论模型如何被修改,它的主要特点从未变化,那又无法解释为什么?由于,每一种修改都涉及到模型的某个原始部分,而这些部分的集合伴成模型的全部。在每一个修改中,只对某些基本元素进行调整,然后从零开始进行多次迭代,从而进一步发现模型的某些局部结构的合适性。又如果一个模型的某个结构在另一个模型的同一棵树下,它将随意地飘出来,这就是它是一个可以被理解为一个舍定结构的“半智慧的模型”。

因此,拥有扭悪的理解力的人,能够独自地通过对模型的分析,将其变成一种可以进行工作的价值,同时仍然保留其先前的形态。而人们毕竟很难学会屏蔽自己对某个问题的解决方法,以便更充分地重视其他方面的学习,而这虽然会让人们显得不停地思考,但是只要终于能够将其推向后层,可以在后来更高级别的框架下产生更多的创造力,实现更快的学习进展。

总之,从理解到模型的核心,从逻辑的梳理到演化的本质,这些都是需要通过机器学习这种领域来帮助我们解决的问题,也许之所以在人心中散发出一种奇特的气息,仿佛似乎能够挽救我们难以解决的谜团。


horseee commented 1 year ago

Hi. Can you explain more about what the problem is?