horseee / LLM-Pruner

[NeurIPS 2023] LLM-Pruner: On the Structural Pruning of Large Language Models. Support Llama-3/3.1, Llama-2, LLaMA, BLOOM, Vicuna, Baichuan, TinyLlama, etc.
https://arxiv.org/abs/2305.11627
Apache License 2.0
836 stars 98 forks source link

the new pytorch.bin is bigger than original model issue #37

Open lb553024300 opened 10 months ago

lb553024300 commented 10 months ago

When I choose save model, I found some strange things。The new pytorch.bin is bigger than original model。I choose Baichuan-7B ,--pruning_ratio 0.5 for test, and add --save_model for save the model after pruning. but the new pytorch.bin is 17GB, and the original model is only 13GB? Could you please tell me why? Thank you!

VainF commented 10 months ago

Hi @lb553024300, don't worry about the bin size. We store the whole model object into the disk with torch.save(model), so the saved file will be larger than torch.save(model.state_dict())

lb553024300 commented 10 months ago

Hi @lb553024300, don't worry about the bin size. We store the whole model object into the disk with torch.save(model), so the saved file will be larger than torch.save(model.state_dict())

Get it,thanks

horseee commented 10 months ago

Hi. Could you please check if you deleted the gradient for calculating Taylor before saving the model?

trinhdoduyhungss commented 10 months ago

Hi @VainF, can you tell me how to convert it back to the same format as the original model? The lighter capacity is what I need for storage. Thank you very much

Hi @lb553024300, don't worry about the bin size. We store the whole model object into the disk with torch.save(model), so the saved file will be larger than torch.save(model.state_dict())

I tried torch.save(model.state_dict()) but the capacity is still the same, is there any way to save the same as the original model on huggingface? I tried loading the model and saving it but it still didn't reduce the size.

import torch
import argparse

def main(args):
    pruned_dict = torch.load(args.ckpt, map_location='cpu')
    tokenizer, model = pruned_dict['tokenizer'], pruned_dict['model']

    print(f"Model took {round(model.get_memory_footprint() / 1e9, 2)} GB")

    # Remove gradient
    model.zero_grad()
    for name, module in model.named_parameters():
        if 'weight' in name:
            module.grad = None

    print(f"Model took {round(model.get_memory_footprint() / 1e9, 2)} GB") #=> ~ 25GB

    # model.half()
    # print(f"Model took {round(model.get_memory_footprint() / 1e9, 2)} GB") #=> ~12GB

    # Save
    model.save_pretrained(args.output_dir) #=> ~ 25GB
    tokenizer.save_pretrained(args.output_dir)

if __name__ == "__main__":
    parser = argparse.ArgumentParser()
    parser.add_argument("--ckpt", type=str, required=True)
    parser.add_argument("--output_dir", type=str, required=True)
    args = parser.parse_args()
    main(args)

I am working with the Bloom model