SafetensorError: Error while deserializing header: InvalidHeaderDeserialization when open .safetensor model

adhiiisetiawan commented 1 year ago

System Info

Hi guys, i just fine tune alpaca (LLaMA 7B base model) with custom dataset and using trainer API. After completing the training process, I received the following error:

SafetensorError                           Traceback (most recent call last)
<ipython-input-16-8ff7a1776602> in <cell line: 18>()
     16 model = torch.compile(model)
     17 
---> 18 trainer.train()
     19 model.save_pretrained(OUTPUT_DIR)

5 frames
/usr/local/lib/python3.10/dist-packages/transformers/trainer.py in train(self, resume_from_checkpoint, trial, ignore_keys_for_eval, **kwargs)
   1554                 hf_hub_utils.enable_progress_bars()
   1555         else:
-> 1556             return inner_training_loop(
   1557                 args=args,
   1558                 resume_from_checkpoint=resume_from_checkpoint,

/usr/local/lib/python3.10/dist-packages/transformers/trainer.py in _inner_training_loop(self, batch_size, args, resume_from_checkpoint, trial, ignore_keys_for_eval)
   1964                 smp.barrier()
   1965 
-> 1966             self._load_best_model()
   1967 
   1968         # add remaining tr_loss

/usr/local/lib/python3.10/dist-packages/transformers/trainer.py in _load_best_model(self)
   2183                     if hasattr(model, "active_adapter") and hasattr(model, "load_adapter"):
   2184                         if os.path.exists(best_adapter_model_path) or os.path.exists(best_safe_adapter_model_path):
-> 2185                             model.load_adapter(self.state.best_model_checkpoint, model.active_adapter)
   2186                             # Load_adapter has no return value present, modify it when appropriate.
   2187                             from torch.nn.modules.module import _IncompatibleKeys

/usr/local/lib/python3.10/dist-packages/peft/peft_model.py in load_adapter(self, model_id, adapter_name, is_trainable, **kwargs)
    601             self.add_adapter(adapter_name, peft_config)
    602 
--> 603         adapters_weights = load_peft_weights(model_id, device=torch_device, **hf_hub_download_kwargs)
    604 
    605         # load the weights into the model

/usr/local/lib/python3.10/dist-packages/peft/utils/save_and_load.py in load_peft_weights(model_id, device, **hf_hub_download_kwargs)
    220 
    221     if use_safetensors:
--> 222         adapters_weights = safe_load_file(filename, device=device)
    223     else:
    224         adapters_weights = torch.load(filename, map_location=torch.device(device))

/usr/local/lib/python3.10/dist-packages/safetensors/torch.py in load_file(filename, device)
    306     """
    307     result = {}
--> 308     with safe_open(filename, framework="pt", device=device) as f:
    309         for k in f.keys():
    310             result[k] = f.get_tensor(k)

SafetensorError: Error while deserializing header: InvalidHeaderDeserialization

and here's my code:

trainer = transformers.Trainer(
    model=model,
    train_dataset=train_data,
    eval_dataset=val_data,
    args=training_arguments,
    data_collator=data_collator
)
model.config.use_cache = False
old_state_dict = model.state_dict
model.state_dict = (
    lambda self, *_, **__: get_peft_model_state_dict(
        self, old_state_dict()
    )
).__get__(model, type(model))

model = torch.compile(model)

trainer.train() # the error from this
model.save_pretrained(OUTPUT_DIR)

I already got this error maybe 3 times. Initially, I suspected it might be related to the model I was using (Alpaca weight base model), but even after switching to the LLaMA 7B base model, the problem persists. Still can't found the root cause and how to solve the problem. But, in my opinion the problem comes from safetensor model itself. Because when I try to open safetensor model using this code, I got same error.

from safetensors import safe_open

tensors = {}
with safe_open("/content/experiments/checkpoint-100/adapter_model.safetensors", framework="pt", device=0) as f:
    for k in f.keys():
        tensors[k] = f.get_tensor(k)

Note: I installed the transformers library from source. When using the version from PyPI, I didn't encounter an error because the model was saved in .bin format, rather than .safetensor.

Reproduction

To reproduce the behavior:

Install the transformers library from source.
Train any model using the installed library.
The model will automatically be saved in .safetensor format.
Once the training is complete, the error will occur.

Expected behavior

Can complete train model using .safetensor

Update

The training process complete using transformers from source, but the model is .bin, not .safetensor. It's okay, but i still curious, why in safetensor got an error when try to open it. here's my colab link when i test to open safetensor model

amyeroberts commented 1 year ago

Hi @adhiiisetiawan, thanks for reporting!

So that we can best help you, could you:

include how the model is created before passing to the trainer?
confirm if this code previously working on an old version of transformers?
provide the running environment: run transformers-cli env in the terminal and copy-paste the output
confirm if this works without the peft logic?

Regarding the PEFT logic - why are you modifying the state_dict directly like this? You can follow the PEFT docs to see the canonical way to load and prepare a model for training: https://huggingface.co/docs/peft/task_guides/image_classification_lora#train-and-evaluate

LysandreJik commented 1 year ago

cc @muellerzr as we discussed this issue yesterday; seems like safetensors aren't very friendly with the Trainer

muellerzr commented 1 year ago

@adhiiisetiawan your issue is the call to torch.compile(). If that step is skipped, you can save and load no problem.

With it included, you should find that model.state_dict() is completely empty, leading to this issue.

The sole reason it doesn't error without safetensors is because torch/pickle is okay loading in the empty dictionary as well. You can see this by simply adding the following code at the end:

model.save_pretrained("test_model", safe_serialization=False)
f = torch.load("test_model/adapter_model.bin")
print(f)

It should print {}. Remove the .compile() and it will work fine. This is a peft issue specifically with save_pretrained and it's behavior with torch.compile. cc @BenjaminBossan

BenjaminBossan commented 1 year ago

A note on PEFT + torch.compile: Unfortunately, torch.compile still has a couple of gaps that make it not work properly in PEFT. There is not much we can do about it except to wait for PyTorch to close those gaps. How that can lead to an empty state_dict, I don't know.

adhiiisetiawan commented 1 year ago

Oh I see, I got it. Thank you very much all for your answer and details explanation @amyeroberts @LysandreJik @muellerzr @BenjaminBossan

safetensors it's work now without torch.compile

MerrillLi commented 1 year ago

@adhiiisetiawan Hello~ I want to know whether the LoRA training will be slowed down without torch.compile? also the the memory consumption increased?

adhiiisetiawan commented 12 months ago

hi @MerrillLi, in my case, i dont have any issue without torch.compile. sorry for late response

tamanna-mostafa commented 9 months ago

I'm having this same issue (details here: https://github.com/huggingface/transformers/issues/28742). Could anyone please help?

cam59 commented 3 months ago

I've had exactly the same issue but I did not use torch.compile. Once I choose save.pretrained() I get the same problem..

My code is here


import os
import argparse
from transformers import (
    LlamaForCausalLM,
    LlamaTokenizer,
    LlamaConfig,
    set_seed,
    default_data_collator,
    BitsAndBytesConfig,
    Trainer,
    TrainingArguments,
)
from datasets import load_from_disk
import torch
import bitsandbytes as bnb
from huggingface_hub import login, HfFolder
import accelerate
from transformers import DataCollatorWithPadding, Trainer, TrainingArguments

model_id = "psymon/KoLlama2-7b" # sharded weights
tokenizer = LlamaTokenizer.from_pretrained(model_id)
tokenizer.pad_token = tokenizer.eos_token

bnb_config = BitsAndBytesConfig(
        load_in_4bit=True,
        bnb_4bit_use_double_quant=True,
        bnb_4bit_quant_type="nf4",
        bnb_4bit_compute_dtype=torch.bfloat16,
    )

model = LlamaForCausalLM.from_pretrained(
        model_id,
        use_cache=False,
        device_map="auto",
        quantization_config=bnb_config,
    )

def find_all_linear_names(model):
    lora_module_names = set()
    for name, module in model.named_modules():
        if isinstance(module, bnb.nn.Linear4bit):
            names = name.split(".")
            lora_module_names.add(names[0] if len(names) == 1 else names[-1])

    if "lm_head" in lora_module_names:  # needed for 16-bit
        lora_module_names.remove("lm_head")
    return list(lora_module_names)

def create_peft_model(model, gradient_checkpointing=True, bf16=True):
    from peft import (
        get_peft_model,
        LoraConfig,
        TaskType,
        prepare_model_for_kbit_training,
    )
    from peft.tuners.lora import LoraLayer

    # prepare int-4 model for training
    model = prepare_model_for_kbit_training(
        model, use_gradient_checkpointing=gradient_checkpointing
    )
    if gradient_checkpointing:
        model.gradient_checkpointing_enable()

    # get lora target modules
    modules = find_all_linear_names(model)
    print(f"Found {len(modules)} modules to quantize: {modules}")

    peft_config = LoraConfig(
        r=64,
        lora_alpha=16,
        target_modules=modules,
        lora_dropout=0.1,
        bias="none",
        task_type=TaskType.CAUSAL_LM,
    )

    model = get_peft_model(model, peft_config)

    # pre-process the model by upcasting the layer norms in float 32 for
    for name, module in model.named_modules():
        if isinstance(module, LoraLayer):
            if bf16:
                module = module.to(torch.bfloat16)
        if "norm" in name:
            module = module.to(torch.float32)
        if "lm_head" in name or "embed_tokens" in name:
            if hasattr(module, "weight"):
                if bf16 and module.weight.dtype == torch.float32:
                    module = module.to(torch.bfloat16)

    model.print_trainable_parameters()
    return model

# create peft config
model = create_peft_model(model, gradient_checkpointing=True, bf16=True)

output_dir = XXXXX
training_args = TrainingArguments(
    output_dir=output_dir,
    per_device_train_batch_size=8,
    bf16=True,  # Use BF16 if available
    learning_rate=5e-5,
    num_train_epochs=3,
    gradient_checkpointing=True,
    # logging strategies
    logging_dir=f"{output_dir}/logs",
    logging_strategy="steps",
    logging_steps=10,
    save_strategy="no",
)

# Create a data collator
data_collator = DataCollatorWithPadding(tokenizer)

# Initialize the custom Trainer
trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=train_dataset,
    eval_dataset=test_dataset,
    data_collator=data_collator,
)

# Start training
trainer.train()

model.merge_and_unload()

model.save_pretrained('model_name')

BenjaminBossan commented 3 months ago

@cam59 Thanks for providing the reproducer. Unfortunately, I cannot reproduce the error. I made some small changes to your script, as you don't provide the data. Also, I used Llama2 7b. Here is the modified script:

import os
import argparse
from transformers import (
    LlamaForCausalLM,
    LlamaTokenizer,
    LlamaConfig,
    set_seed,
    default_data_collator,
    BitsAndBytesConfig,
    Trainer,
    TrainingArguments,
)
from datasets import load_dataset
import torch
import bitsandbytes as bnb
from huggingface_hub import login, HfFolder
import accelerate
from transformers import DataCollatorWithPadding, Trainer, TrainingArguments, DataCollatorForLanguageModeling

# model_id = "psymon/KoLlama2-7b" # sharded weights
model_id = "meta-llama/Llama-2-7b-hf" # BB
tokenizer = LlamaTokenizer.from_pretrained(model_id)
tokenizer.pad_token = tokenizer.eos_token

bnb_config = BitsAndBytesConfig(
        load_in_4bit=True,
        bnb_4bit_use_double_quant=True,
        bnb_4bit_quant_type="nf4",
        bnb_4bit_compute_dtype=torch.bfloat16,
    )

model = LlamaForCausalLM.from_pretrained(
        model_id,
        use_cache=False,
        device_map="auto",
        quantization_config=bnb_config,
    )

def find_all_linear_names(model):
    lora_module_names = set()
    for name, module in model.named_modules():
        if isinstance(module, bnb.nn.Linear4bit):
            names = name.split(".")
            lora_module_names.add(names[0] if len(names) == 1 else names[-1])

    if "lm_head" in lora_module_names:  # needed for 16-bit
        lora_module_names.remove("lm_head")
    return list(lora_module_names)

def create_peft_model(model, gradient_checkpointing=True, bf16=True):
    from peft import (
        get_peft_model,
        LoraConfig,
        TaskType,
        prepare_model_for_kbit_training,
    )
    from peft.tuners.lora import LoraLayer

    # prepare int-4 model for training
    model = prepare_model_for_kbit_training(
        model, use_gradient_checkpointing=gradient_checkpointing
    )
    if gradient_checkpointing:
        model.gradient_checkpointing_enable()

    # get lora target modules
    modules = find_all_linear_names(model)
    print(f"Found {len(modules)} modules to quantize: {modules}")

    peft_config = LoraConfig(
        r=64,
        lora_alpha=16,
        target_modules=modules,
        lora_dropout=0.1,
        bias="none",
        task_type=TaskType.CAUSAL_LM,
    )

    model = get_peft_model(model, peft_config)

    # pre-process the model by upcasting the layer norms in float 32 for
    for name, module in model.named_modules():
        if isinstance(module, LoraLayer):
            if bf16:
                module = module.to(torch.bfloat16)
        if "norm" in name:
            module = module.to(torch.float32)
        if "lm_head" in name or "embed_tokens" in name:
            if hasattr(module, "weight"):
                if bf16 and module.weight.dtype == torch.float32:
                    module = module.to(torch.bfloat16)

    model.print_trainable_parameters()
    return model

# create peft config
model = create_peft_model(model, gradient_checkpointing=True, bf16=True)

output_dir = "/tmp/peft/transformers/27397"
training_args = TrainingArguments(
    output_dir=output_dir,
    per_device_train_batch_size=8,
    bf16=True,  # Use BF16 if available
    learning_rate=5e-5,
    #num_train_epochs=3,
    max_steps=2, # BB
    gradient_checkpointing=True,
    # logging strategies
    logging_dir=f"{output_dir}/logs",
    logging_strategy="steps",
    logging_steps=10,
    save_strategy="no",
)

# Create a data collator
# data_collator = DataCollatorWithPadding(tokenizer)
data_collator = DataCollatorForLanguageModeling(tokenizer, mlm=False) # BB

# BB
data = load_dataset("ybelkada/english_quotes_copy")
data = data.map(lambda samples: tokenizer(samples["quote"]), batched=True)
train_dataset = data["train"]
test_dataset = data["train"]

# Initialize the custom Trainer
trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=train_dataset,
    eval_dataset=test_dataset,
    data_collator=data_collator,
)

# Start training
trainer.train()

model.merge_and_unload()

model.save_pretrained(f"{output_dir}/final_model")

Could you check if this passes successfully for you? If yes, any idea what the crucial difference is to your script?

Btw, find_all_linear_names should not be necessary anymore, you can pass target_modules="all-linear" to the LoraConfig.

tytcc commented 3 months ago

I have same problem. But i don't use torch.compile either. I used sfttrainer to train the model with lora and deepspeed zero3. but when i load the checkpoint in epoch save strategy, it would get the error. here is my trainingarguments:


training_args = TrainingArguments(
    output_dir=output_dir,
    per_device_train_batch_size=8,
    bf16=True,  # Use BF16 if available
    learning_rate=5e-5,
    num_train_epochs=3,
    optim = "adamw_torch",
    # logging strategies
    logging_dir=f"{output_dir}/logs",
    logging_strategy="steps",
    logging_steps=10,
    save_strategy="epoch",
)
trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=train_dataset,
    eval_dataset=test_dataset,
    data_collator=data_collator,
   peft_config = peft_config
)

huggingface / transformers