Closed azayz closed 1 year ago
Hi @azayz Thanks for raising up this discussion, you should probably be able to partially achieve your goal by doing
python
model.base_model.save_pretrained(xxx)
However you might end up with adapter weights + model weights in some cases (e.g. LoRA). Can you try that and let us know how it goes?
@younesbelkada thanks for your suggestion, I used the following code to try to save the full model
from transformers import (
AutoConfig,
AutoModelForCausalLM,
BitsAndBytesConfig,
)
import torch
from peft import PeftModel
base_model = "tiiuae/falcon-7b"
device_map = "auto"
load_in_4bit = False
load_in_8bit = True
lora_weights = "artifacts/lora_weight:v12/"
quant_config = BitsAndBytesConfig(
load_in_4bit=load_in_4bit,
load_in_8bit=load_in_8bit,
bnb_4bit_use_double_quant=True,
bnb_4bit_quant_type="nf4",
bnb_4bit_compute_dtype=torch.float16,
)
model_config = AutoConfig.from_pretrained(base_model, trust_remote_code=True)
model = AutoModelForCausalLM.from_pretrained(
pretrained_model_name_or_path=base_model,
torch_dtype=torch.float16,
device_map=device_map,
config=model_config,
quantization_config=quant_config,
trust_remote_code=True,
)
model = PeftModel.from_pretrained(
model,
lora_weights,
torch_dtype=torch.float16,
)
model.base_model.save_pretrained("full_model")
However it doesnt seem to work, I'm getting the following error:
shards, index = shard_checkpoint(state_dict, max_shard_size=max_shard_size, weights_name=weights_name)
File "/home/ubuntu/.cache/pypoetry/virtualenvs/jerboa-wA-oMqxs-py3.10/lib/python3.10/site-packages/transformers/modeling_utils.py", line 318, in shard_checkpoint
storage_id = id_tensor_storage(weight)
File "/home/ubuntu/.cache/pypoetry/virtualenvs/jerboa-wA-oMqxs-py3.10/lib/python3.10/site-packages/transformers/pytorch_utils.py", line 290, in id_tensor_storage
return tensor.device, storage_ptr(tensor), storage_size(tensor)
bitsandbytes==0.39.1 peft==0.4.0.dev0 torch==2.0.1 transformers==4.30.2
Thanks for double checking, can you share the full traceback? 🙏
Full traceback
Traceback (most recent call last):
File "/home/ubuntu/jerboa/model_saving.py", line 38, in <module>
model.base_model.save_pretrained("full_model")
File "/home/ubuntu/.cache/pypoetry/virtualenvs/jerboa-wA-oMqxs-py3.10/lib/python3.10/site-packages/transformers/modeling_utils.py", line 1820, in save_pretrained
shards, index = shard_checkpoint(state_dict, max_shard_size=max_shard_size, weights_name=weights_name)
File "/home/ubuntu/.cache/pypoetry/virtualenvs/jerboa-wA-oMqxs-py3.10/lib/python3.10/site-packages/transformers/modeling_utils.py", line 318, in shard_checkpoint
storage_id = id_tensor_storage(weight)
File "/home/ubuntu/.cache/pypoetry/virtualenvs/jerboa-wA-oMqxs-py3.10/lib/python3.10/site-packages/transformers/pytorch_utils.py", line 290, in id_tensor_storage
return tensor.device, storage_ptr(tensor), storage_size(tensor)
AttributeError: 'str' object has no attribute 'device'
It seems that there's a weight called "row" that's not a tensor but a string with that value, and the keys had this weight_format at the end: transformer.h.31.self_attention.query_key_value.weight_format
updating to latest transformers (main) solves the issue
Indeed, this is related to the recent release of bitsandbytes, check: https://github.com/huggingface/transformers/pull/24416
Hi @azayz ,
I wonder how to save the whole backbone model every saving steps during the training? I guess that your example only save at the beginning or end of training. Any thoughts?
Best, Chiyu
Hi @azayz Thanks for raising up this discussion, you should probably be able to partially achieve your goal by doing
python model.base_model.save_pretrained(xxx)
However you might end up with adapter weights + model weights in some cases (e.g. LoRA). Can you try that and let us know how it goes?
In this way, only the base model will be saved, without saving the LoRA adapter.
Please check the generated config.json
file:
{
"_name_or_path": "meta-llama/Llama-2-7b-hf",
"architectures": [
"LlamaModel"
],
"attention_bias": false,
"bos_token_id": 1,
"eos_token_id": 2,
"hidden_act": "silu",
"hidden_size": 4096,
"initializer_range": 0.02,
"intermediate_size": 11008,
"max_position_embeddings": 4096,
"model_type": "llama",
"num_attention_heads": 32,
"num_hidden_layers": 32,
"num_key_value_heads": 32,
"pad_token_id": 2,
"pretraining_tp": 1,
"quantization_config": {
"bnb_4bit_compute_dtype": "float16",
"bnb_4bit_quant_type": "nf4",
"bnb_4bit_use_double_quant": false,
"llm_int8_enable_fp32_cpu_offload": false,
"llm_int8_has_fp16_weight": false,
"llm_int8_skip_modules": null,
"llm_int8_threshold": 6.0,
"load_in_4bit": true,
"load_in_8bit": false,
"quant_method": "bitsandbytes"
},
"rms_norm_eps": 1e-05,
"rope_scaling": null,
"rope_theta": 10000.0,
"tie_word_embeddings": false,
"torch_dtype": "float16",
"transformers_version": "4.34.0",
"use_cache": true,
"vocab_size": 32001
}
model.base_model.save_pretrained("full_model")
Are u sure? In your last line:
model.base_model.save_pretrained("full_model")
You saved the base model without the LoRA adapter. Please check your config.json
file!
@younesbelkada What do you think of trainer.model.base_model
? Are there any LoRA weights inside this model, or is it just the pre-trained base model without any LoRA weights? Thank you very much in advance!
@chiyuzhang94 @azayz @tmm1
The task is to merge adapter weights into the base model. However, the below code do not save the merged model, that is fine tuned model.
base_model_path = "openai-community/gpt2"
base_model = GPT2LMHeadModel.from_pretrained(base_model_path, device_map="auto")
base_model.resize_token_embeddings(len(tokenizer))
model = PeftModel.from_pretrained(base_model, peft_model_path, device_map="auto")
merged_model = model.merge_and_unload()
merged_model.save_pretrained("gpt2-turkish-300k-17_4_24", save_adapters=True, save_embedding_layers=True)
Changing the last line with the below saves the right model.
model.base_model.save_pretrained("full_model")
System Info
peft==0.4.0.dev0
I'm not sure if this should be a bug report, so sorry if this is not convenient. According to the
save_pretrained
method docstring, this saves the adapter model only and not the full model weights, is there an option where I can save the full model weights ? The use case is that we want to upload the full model to hf to be able to activate the inference API, however now we only save adapter weightsWho can help?
No response
Information
Tasks
examples
folderReproduction
save_pretrained saves only adapters, maybe also add the option to save the full model
Expected behavior
save_pretrained saves only adapters, maybe also add the option to save the full model