luuyin / OWL

Official Pytorch Implementation of "Outlier Weighed Layerwise Sparsity (OWL): A Missing Secret Sauce for Pruning LLMs to High Sparsity"
https://arxiv.org/pdf/2310.05175.pdf
MIT License
51 stars 8 forks source link

Why reproduced result is different from the one in paper? #8

Closed SHUSHENGQIGUI closed 5 months ago

SHUSHENGQIGUI commented 5 months ago

Hi, I am reproducing the llama-7b pruning experiment, but the perplexities after pruning 70% by sparsegpt, wanda, owl w. are different from the result in paper, as table in below: image

second col is my res, third col is paper's result. i wander if i miss any point or if hyperparams in this repo are different from ones in paper?

luuyin commented 5 months ago

Hi, Thanks for your question. I'm unclear about the values presented in your table; could you please clarify what they represent? Additionally, I'd like to confirm whether you are using the LLaMA-V1 model and if you are strictly adhering to the hyperparameters and scripts provided in the README.

SHUSHENGQIGUI commented 5 months ago

Hi, Thanks for your question. I'm unclear about the values presented in your table; could you please clarify what they represent? Additionally, I'd like to confirm whether you are using the LLaMA-V1 model and if you are strictly adhering to the hyperparameters and scripts provided in the README.

Thank you for your reply. ok, I will make my question clear:

  1. I am using llama-v1-7b, and i can reproduce the dense result to be the same as the one in paper at 5.68
  2. i am strictly adhering code and hyperparams in the ReadMe, for example, owl w,wanda prune 70%:python main.py \ --model_name_or_path decapoda-research/llama-7b-hf \ --Lamda 0.08 \ --Hyper_m 5 \ --model baffo32/decapoda-research-llama-7B-hf \ --prune_method wanda_owl \ --sparsity_ratio 0.7 \ --sparsity_type unstructured \ --save save_wanda_owl_test/ \ --save_model out/llama-7b/unstructured/wanda_owl/sparse0.7/ please check if it is correct
  3. the table below is comprison between my result and paper: image

I want to know if there are any codes or hyperparameter configurations that I need to modify or add.

SHUSHENGQIGUI commented 5 months ago

Hi, Thanks for your question. I'm unclear about the values presented in your table; could you please clarify what they represent? Additionally, I'd like to confirm whether you are using the LLaMA-V1 model and if you are strictly adhering to the hyperparameters and scripts provided in the README.

Thank you for your reply. ok, I will make my question clear:

  1. I am using llama-v1-7b, and i can reproduce the dense result to be the same as the one in paper at 5.68
  2. i am strictly adhering code and hyperparams in the ReadMe, for example, owl w,wanda prune 70%:python main.py \ --model_name_or_path decapoda-research/llama-7b-hf \ --Lamda 0.08 \ --Hyper_m 5 \ --model baffo32/decapoda-research-llama-7B-hf \ --prune_method wanda_owl \ --sparsity_ratio 0.7 \ --sparsity_type unstructured \ --save save_wanda_owl_test/ \ --save_model out/llama-7b/unstructured/wanda_owl/sparse0.7/ please check if it is correct
  3. the table below is comprison between my result and paper: image

I want to know if there are any codes or hyperparameter configurations that I need to modify or add.

@luuyin hi, could you reply to me in your spare time ? i am stuck in this problem. I don't know how i reproduce the result as showed in paper. please