How to save full model weights and not just the adapters ?

azayz commented 1 year ago

System Info

peft==0.4.0.dev0

I'm not sure if this should be a bug report, so sorry if this is not convenient. According to the save_pretrainedmethod docstring, this saves the adapter model only and not the full model weights, is there an option where I can save the full model weights ? The use case is that we want to upload the full model to hf to be able to activate the inference API, however now we only save adapter weights

Who can help?

No response

Information

[ ] The official example scripts
[ ] My own modified scripts

Tasks

[ ] An officially supported task in the examples folder
[ ] My own task or dataset (give details below)

Reproduction

save_pretrained saves only adapters, maybe also add the option to save the full model

Expected behavior

save_pretrained saves only adapters, maybe also add the option to save the full model

younesbelkada commented 1 year ago

Hi @azayz Thanks for raising up this discussion, you should probably be able to partially achieve your goal by doing

python
model.base_model.save_pretrained(xxx)

However you might end up with adapter weights + model weights in some cases (e.g. LoRA). Can you try that and let us know how it goes?

azayz commented 1 year ago

@younesbelkada thanks for your suggestion, I used the following code to try to save the full model

from transformers import (
    AutoConfig,
    AutoModelForCausalLM,
    BitsAndBytesConfig,
)
import torch
from peft import PeftModel

base_model = "tiiuae/falcon-7b"
device_map = "auto"
load_in_4bit = False
load_in_8bit = True
lora_weights = "artifacts/lora_weight:v12/"

quant_config = BitsAndBytesConfig(
    load_in_4bit=load_in_4bit,
    load_in_8bit=load_in_8bit,
    bnb_4bit_use_double_quant=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype=torch.float16,
)

model_config = AutoConfig.from_pretrained(base_model, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
    pretrained_model_name_or_path=base_model,
    torch_dtype=torch.float16,
    device_map=device_map,
    config=model_config,
    quantization_config=quant_config,
    trust_remote_code=True,
)
model = PeftModel.from_pretrained(
    model,
    lora_weights,
    torch_dtype=torch.float16,
)
model.base_model.save_pretrained("full_model")

However it doesnt seem to work, I'm getting the following error:

    shards, index = shard_checkpoint(state_dict, max_shard_size=max_shard_size, weights_name=weights_name)
  File "/home/ubuntu/.cache/pypoetry/virtualenvs/jerboa-wA-oMqxs-py3.10/lib/python3.10/site-packages/transformers/modeling_utils.py", line 318, in shard_checkpoint
    storage_id = id_tensor_storage(weight)
  File "/home/ubuntu/.cache/pypoetry/virtualenvs/jerboa-wA-oMqxs-py3.10/lib/python3.10/site-packages/transformers/pytorch_utils.py", line 290, in id_tensor_storage
    return tensor.device, storage_ptr(tensor), storage_size(tensor)

bitsandbytes==0.39.1 peft==0.4.0.dev0 torch==2.0.1 transformers==4.30.2

younesbelkada commented 1 year ago

Thanks for double checking, can you share the full traceback? 🙏

azayz commented 1 year ago

Full traceback

Traceback (most recent call last):
  File "/home/ubuntu/jerboa/model_saving.py", line 38, in <module>
    model.base_model.save_pretrained("full_model")
  File "/home/ubuntu/.cache/pypoetry/virtualenvs/jerboa-wA-oMqxs-py3.10/lib/python3.10/site-packages/transformers/modeling_utils.py", line 1820, in save_pretrained
    shards, index = shard_checkpoint(state_dict, max_shard_size=max_shard_size, weights_name=weights_name)
  File "/home/ubuntu/.cache/pypoetry/virtualenvs/jerboa-wA-oMqxs-py3.10/lib/python3.10/site-packages/transformers/modeling_utils.py", line 318, in shard_checkpoint
    storage_id = id_tensor_storage(weight)
  File "/home/ubuntu/.cache/pypoetry/virtualenvs/jerboa-wA-oMqxs-py3.10/lib/python3.10/site-packages/transformers/pytorch_utils.py", line 290, in id_tensor_storage
    return tensor.device, storage_ptr(tensor), storage_size(tensor)
AttributeError: 'str' object has no attribute 'device'

azayz commented 1 year ago

It seems that there's a weight called "row" that's not a tensor but a string with that value, and the keys had this weight_format at the end: transformer.h.31.self_attention.query_key_value.weight_format

azayz commented 1 year ago

updating to latest transformers (main) solves the issue

younesbelkada commented 1 year ago

Indeed, this is related to the recent release of bitsandbytes, check: https://github.com/huggingface/transformers/pull/24416

chiyuzhang94 commented 12 months ago

Hi @azayz ,

I wonder how to save the whole backbone model every saving steps during the training? I guess that your example only save at the beginning or end of training. Any thoughts?

Best, Chiyu

SuperBruceJia commented 10 months ago

Hi @azayz Thanks for raising up this discussion, you should probably be able to partially achieve your goal by doing
python
model.base_model.save_pretrained(xxx)
However you might end up with adapter weights + model weights in some cases (e.g. LoRA). Can you try that and let us know how it goes?

In this way, only the base model will be saved, without saving the LoRA adapter. Please check the generated config.json file:

{
  "_name_or_path": "meta-llama/Llama-2-7b-hf",
  "architectures": [
    "LlamaModel"
  ],
  "attention_bias": false,
  "bos_token_id": 1,
  "eos_token_id": 2,
  "hidden_act": "silu",
  "hidden_size": 4096,
  "initializer_range": 0.02,
  "intermediate_size": 11008,
  "max_position_embeddings": 4096,
  "model_type": "llama",
  "num_attention_heads": 32,
  "num_hidden_layers": 32,
  "num_key_value_heads": 32,
  "pad_token_id": 2,
  "pretraining_tp": 1,
  "quantization_config": {
    "bnb_4bit_compute_dtype": "float16",
    "bnb_4bit_quant_type": "nf4",
    "bnb_4bit_use_double_quant": false,
    "llm_int8_enable_fp32_cpu_offload": false,
    "llm_int8_has_fp16_weight": false,
    "llm_int8_skip_modules": null,
    "llm_int8_threshold": 6.0,
    "load_in_4bit": true,
    "load_in_8bit": false,
    "quant_method": "bitsandbytes"
  },
  "rms_norm_eps": 1e-05,
  "rope_scaling": null,
  "rope_theta": 10000.0,
  "tie_word_embeddings": false,
  "torch_dtype": "float16",
  "transformers_version": "4.34.0",
  "use_cache": true,
  "vocab_size": 32001
}

SuperBruceJia commented 10 months ago

model.base_model.save_pretrained("full_model")

Are u sure? In your last line:

model.base_model.save_pretrained("full_model")

You saved the base model without the LoRA adapter. Please check your config.json file!

@younesbelkada What do you think of trainer.model.base_model? Are there any LoRA weights inside this model, or is it just the pre-trained base model without any LoRA weights? Thank you very much in advance!

@chiyuzhang94 @azayz @tmm1

bezir commented 5 months ago

The task is to merge adapter weights into the base model. However, the below code do not save the merged model, that is fine tuned model.

base_model_path = "openai-community/gpt2"
base_model = GPT2LMHeadModel.from_pretrained(base_model_path, device_map="auto")

base_model.resize_token_embeddings(len(tokenizer))

model = PeftModel.from_pretrained(base_model, peft_model_path, device_map="auto")
merged_model = model.merge_and_unload()
merged_model.save_pretrained("gpt2-turkish-300k-17_4_24", save_adapters=True, save_embedding_layers=True)

Changing the last line with the below saves the right model. model.base_model.save_pretrained("full_model")

huggingface / peft