Issue: Missing Generation of `pytorch_model.bin` File During Model Tuning

horseee / LLM-Pruner

[NeurIPS 2023] LLM-Pruner: On the Structural Pruning of Large Language Models. Support Llama-3/3.1, Llama-2, LLaMA, BLOOM, Vicuna, Baichuan, TinyLlama, etc.

https://arxiv.org/abs/2305.11627

Apache License 2.0

875 stars 104 forks source link

Issue: Missing Generation of `pytorch_model.bin` File During Model Tuning #45

Closed WilliamYi96 closed 11 months ago

WilliamYi96 commented 11 months ago

Thank you for sharing your interesting project!

Recently, when I ran bash ./script/llama_prune.sh, the pruning step worked perfectly fine. However, during the tuning step, although there were no error information, the generated structure only included the following:

checkpoints-200
- model.safetensors
- optimizer.pt
- rng_state.pth
- scheduler.pt
- trainer_state.json
- training_args.bin

I noticed that the pytorch_model.bin file was not saved. I haven't modified the code, and I am using PyTorch version 2.1.2+cu121. Could you suggest what the possible reason for this might be?

WilliamYi96 commented 11 months ago

Issue resolved. The reason lies in the newer versions of the transformers library, where safetensors has become the default format, replacing pytorch_model.bin, starting from transformers>=4.33.0. This issue can be addressed by either downgrading to transformers==4.33.0 using pip install transformers==4.33.0, or by setting self_serialization=False in model.save_pretrained().

Tracking here: https://github.com/huggingface/transformers/issues/28183

WilliamYi96 commented 11 months ago

Two updates:

pip install transformers==4.33.0 will lead to the following issue:

AttributeError: 'LlamaTokenizer' object has no attribute 'added_tokens_decoder'. Did you mean: '_added_tokens_decoder'?

If using the latest transformers and setting self_serialization=False, there is still no pytorch_model.bin saved.

This issue still exists.

WilliamYi96 commented 11 months ago

Issue resolved. The problem is that when constructing the trainer, save_safetensors=False should be set. Otherwise, the above safe_serialization=False will not work.

https://huggingface.co/docs/transformers/v4.36.1/en/main_classes/trainer#transformers.TrainingArguments.save_safetensors

RamitPahwa commented 3 months ago

@WilliamYi96 Can we recover pytorch.bin from the safe tensor representation ? I have already run the finetuning on a bigger dataset for some time and want to avoid triggering the learning. or can we resume from the checkpoint and save after running for some steps?

yaolu-zjut commented 2 months ago

Issue resolved. The problem is that when constructing the trainer, save_safetensors=False should be set. Otherwise, the above safe_serialization=False will not work.

https://huggingface.co/docs/transformers/v4.36.1/en/main_classes/trainer#transformers.TrainingArguments.save_safetensors

Hi, I just set save_safetensors=False, but it still not works.