Closed tamanna-mostafa closed 7 months ago
Hi @tamanna-mostafa, thanks for raising this issue!
Could you list the files saved under /data/DPO_output_mistral_32k
?
@amyeroberts
Hi, thanks for your comment. Below is what I see when I run ls
in the folder, DPO_output_mistral_32k
:
ubuntu@ip-172-31-8-218:/mnt/efs/data/tammosta/files_t/DPO_output_mistral_32k$ ls
README.md adapter_model.safetensors checkpoint-100 checkpoint-300 checkpoint-500 checkpoint-700 global_step736 special_tokens_map.json tokenizer.model training_args.bin
adapter_config.json added_tokens.json checkpoint-200 checkpoint-400 checkpoint-600 final_checkpoint latest tokenizer.json tokenizer_config.json zero_to_fp32.py
Could you share how you're loading the model? If you're using adapters, then I'd expect this pattern:
from transformers import AutoModelForCausalLM
model_id = "{MISTRAL_CHECKPOINT}"
dpo_model_id = "/data/DPO_output_mistral_32k"
model = AutoModelForCausalLM.from_pretrained(model_id)
model.load_adapter(dpo_model_id)
You mean how I'm loading the DPO model on docker? Here are the steps:
model=/data/DPO_output_mistral_32k
volume=/mnt/efs/data/tammosta/files_t:/data
num_shard=8
docker run --gpus all --shm-size 1g -p 172.31.8.218:80:80 -v $volume ghcr.io/huggingface/text-generation-inference:1.1.0 --model-id $model --num-shard $num_shard --max-input-length 4095 --max-total-tokens 12000
Just in case, this is the command I ran for the DPO training:
accelerate launch --config_file ./accelerate_configs/ds_zero3.yaml rlhf_dpo.py \
--model_name_or_path="/mnt/efs/data/tammosta/files_t/output_sft_32k" \
--output_dir="/mnt/efs/data/tammosta/files_t/DPO_output_mistral_32k" \
--data_path="/mnt/efs/data/tammosta/files_t/DPO_data_rbs_clean_AIF.json" \
--use_lamma2_peft_config False \
--beta 0.1 \
--optimizer_type adamw_hf \
--learning_rate 1e-6 \
--warmup_steps 50 \
--per_device_train_batch_size 1 \
--per_device_eval_batch_size 1 \
--gradient_accumulation_steps 8 \
--lora_alpha 16 \
--lora_dropout 0.05 \
--lora_r 8 \
--max_prompt_length 2048 \
--max_length 4096 \
--num_train_epochs 4 \
--logging_steps 20 \
--save_steps 100 \
--save_total_limit 8 \
--eval_steps 50 \
--gradient_checkpointing True \
--report_to "wandb"
It uses adapters in DPO training.
Is there anything wrong in the way I'm loading the model on docker?
@tamanna-mostafa No, I don't think so. From the current error and the files in the model repo I currently think there's two possible causes:
text_generation_server
packagerlhf_dpo.py
script.For a model with adapter weights, I'd expect the adapter weights repo to look something like this: https://huggingface.co/ybelkada/opt-350m-lora/tree/main
Could you share the contents of the adapter_config.json
?
@amyeroberts
How the model is being loaded in the text_generation_server package
In my understanding, I used the below steps to load the model (prior to running docker):
model=/data/DPO_output_mistral_32k
volume=/mnt/efs/data/tammosta/files_t:/data
Here is the contents of the adapter_config.json
:
{
"alpha_pattern": {},
"auto_mapping": null,
"base_model_name_or_path": "/mnt/efs/data/tammosta/files_t/output_sft_32k",
"bias": "none",
"fan_in_fan_out": false,
"inference_mode": true,
"init_lora_weights": true,
"layers_pattern": null,
"layers_to_transform": null,
"loftq_config": {},
"lora_alpha": 16.0,
"lora_dropout": 0.05,
"megatron_config": null,
"megatron_core": "megatron.core",
"modules_to_save": null,
"peft_type": "LORA",
"r": 8,
"rank_pattern": {},
"revision": null,
"target_modules": [
"v_proj",
"q_proj",
"up_proj",
"down_proj",
"gate_proj",
"o_proj",
"k_proj"
],
"task_type": "CAUSAL_LM"
I'm also pasting the last 3 sections from the rlhf_dpo.py
script.
# 5. initialize the DPO trainer
dpo_trainer = DPOTrainer(
model,
model_ref,
args=training_args,
beta=script_args.beta,
train_dataset=train_dataset,
eval_dataset=eval_dataset,
tokenizer=tokenizer,
peft_config=peft_config,
max_prompt_length=script_args.max_prompt_length,
max_length=script_args.max_length,
)
# 6. train
dpo_trainer.train()
dpo_trainer.save_model(script_args.output_dir)
# 7. save
output_dir = os.path.join(script_args.output_dir, "final_checkpoint")
dpo_trainer.model.save_pretrained(output_dir)
@amyeroberts
Hi, did you have a chance to take a look? thanks
I'm going cc in @younesbelkada here, who knows more about the DPO trainer and expected values in the configs :)
Hi @tamanna-mostafa Thanks for the issue! in order to run the trained adapter with TGI using Docker, you need to first merge the adapter weights into the base model, and push / save the merged weights somewhere either on the Hub or locally.
By merging the adapter weights you make sure to convert the trained model into a standalone transformers model so that it becomes compatible with TGI. Please see: https://huggingface.co/docs/peft/main/en/conceptual_guides/lora#merge-lora-weights-into-the-base-model to understand what merging means.
To merge the model, run:
from peft import AutoPeftModelForCausalLM
model = AutoPeftModelForCausalLM.from_pretrained(model_id)
model = model.merge_and_unload()
# at this point the model is a standalone transformers model
model.push_to_hub(xxx)
Hi @younesbelkada
Thanks a lot for your comment. As the base model, I used mistral 7b that I fine-tuned with my own preference data. I ran the following code to merge:
from transformers import AutoModelForCausalLM
from peft import PeftModel
import torch
#base_model = "/mnt/efs/data/tammosta/files_t/output_sft_32k"
base_model = AutoModelForCausalLM.from_pretrained(
"/mnt/efs/data/tammosta/files_t/output_sft_32k",
return_dict=True,
torch_dtype=torch.float16,
trust_remote_code=True,
#**device_arg
)
peft_model_id = "/mnt/efs/data/tammosta/files_t/DPO_output_32k_Test/final_checkpoint"
model = PeftModel.from_pretrained(base_model, peft_model_id)
merged_model = model.merge_and_unload()
merged_model.save_pretrained("/mnt/efs/data/tammosta/files_t/DPO_output_mistral_32k_merged")
However, I'm getting the following error:
Loading checkpoint shards: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████| 3/3 [00:01<00:00, 2.77it/s]
Traceback (most recent call last):
File "/mnt/efs/data/tammosta/files_t/merge_peft_tammosta.py", line 14, in <module>
model = PeftModel.from_pretrained(base_model, peft_model_id)
File "/opt/conda/envs/ml_v4/lib/python3.10/site-packages/peft/peft_model.py", line 354, in from_pretrained
model.load_adapter(model_id, adapter_name, is_trainable=is_trainable, **kwargs)
File "/opt/conda/envs/ml_v4/lib/python3.10/site-packages/peft/peft_model.py", line 698, in load_adapter
load_result = set_peft_model_state_dict(self, adapters_weights, adapter_name=adapter_name)
File "/opt/conda/envs/ml_v4/lib/python3.10/site-packages/peft/utils/save_and_load.py", line 241, in set_peft_model_state_dict
load_result = model.load_state_dict(peft_model_state_dict, strict=False)
File "/opt/conda/envs/ml_v4/lib/python3.10/site-packages/torch/nn/modules/module.py", line 2152, in load_state_dict
raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format(
RuntimeError: Error(s) in loading state_dict for PeftModelForCausalLM:
size mismatch for base_model.model.model.layers.0.mlp.gate_proj.lora_B.default.weight: copying a param with shape torch.Size([0]) from checkpoint, the shape in current model is torch.Size([14336, 16]).
size mismatch for base_model.model.model.layers.0.mlp.up_proj.lora_B.default.weight: copying a param with shape torch.Size([0]) from checkpoint, the shape in current model is torch.Size([14336, 16]).
size mismatch for base_model.model.model.layers.0.mlp.down_proj.lora_A.default.weight: copying a param with shape torch.Size([0]) from checkpoint, the shape in current model is torch.Size([16, 14336]).
size mismatch for base_model.model.model.layers.1.mlp.gate_proj.lora_B.default.weight: copying a param with shape torch.Size([0]) from checkpoint, the shape in current model is torch.Size([14336, 16]).
. . .
It looks the weights in the base model doesn't match with that in the PEFT model. Could you please suggest a possible way of debugging this?
@tamanna-mostafa have you used DeepSpeed to train your adapters by any chance?
Can you also try:
from transformers import AutoModelForCausalLM
from peft import AutoPeftModelForCausalLM
import torch
peft_model_id = "/mnt/efs/data/tammosta/files_t/DPO_output_32k_Test/final_checkpoint"
model = AutoPeftModelForCausalLM.from_pretrained(peft_model_id, torch_dtype=torch.float16,)
model = model.merge_and_unload()
model.save_pretrained("/mnt/efs/data/tammosta/files_t/DPO_output_mistral_32k_merged")
Hi @younesbelkada ,
I used DeepSpeed
to fine tune the mistral 7b (the base model).
I used accelerate launch
to train the DPO model (PEFT model).
When I run the suggested code, I get:
Loading checkpoint shards: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████| 3/3 [00:01<00:00, 2.64it/s]
Traceback (most recent call last):
File "/mnt/efs/data/tammosta/files_t/merge_peft_tammosta_2.py", line 6, in <module>
model = AutoPeftModelForCausalLM.from_pretrained(peft_model_id, torch_dtype=torch.float16,)
File "/opt/conda/envs/ml_v4/lib/python3.10/site-packages/peft/auto.py", line 115, in from_pretrained
tokenizer_exists = file_exists(
File "/opt/conda/envs/ml_v4/lib/python3.10/site-packages/huggingface_hub/utils/_validators.py", line 110, in _inner_fn
validate_repo_id(arg_value)
File "/opt/conda/envs/ml_v4/lib/python3.10/site-packages/huggingface_hub/utils/_validators.py", line 158, in validate_repo_id
raise HFValidationError(
huggingface_hub.utils._validators.HFValidationError: Repo id must be in the form 'repo_name' or 'namespace/repo_name': '/mnt/efs/data/tammosta/files_t/DPO_output_32k_Test/final_checkpoint'. Use `repo_type` argument if needed.
@tamanna-mostafa it seems that behavior is a duplicate of https://github.com/huggingface/peft/issues/1430 - can you try to pass a relative path instead and run the script from the final checkpoint folder ? I'll submit a fix on PEFT
Using the relative path, I've the issue of size mismatch
:
(ml_v2) ubuntu@ip-172-31-32-104:/mnt/efs/data/tammosta/files_t$ python hf_test_2.py
Loading checkpoint shards: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████| 3/3 [00:01<00:00, 2.18it/s]
Traceback (most recent call last):
File "/mnt/efs/data/tammosta/files_t/hf_test_2.py", line 6, in <module>
model = AutoPeftModelForCausalLM.from_pretrained(peft_model_id, torch_dtype=torch.float16,)
File "/opt/conda/envs/ml_v2/lib/python3.10/site-packages/peft/auto.py", line 127, in from_pretrained
return cls._target_peft_class.from_pretrained(
File "/opt/conda/envs/ml_v2/lib/python3.10/site-packages/peft/peft_model.py", line 354, in from_pretrained
model.load_adapter(model_id, adapter_name, is_trainable=is_trainable, **kwargs)
File "/opt/conda/envs/ml_v2/lib/python3.10/site-packages/peft/peft_model.py", line 698, in load_adapter
load_result = set_peft_model_state_dict(self, adapters_weights, adapter_name=adapter_name)
File "/opt/conda/envs/ml_v2/lib/python3.10/site-packages/peft/utils/save_and_load.py", line 241, in set_peft_model_state_dict
load_result = model.load_state_dict(peft_model_state_dict, strict=False)
File "/opt/conda/envs/ml_v2/lib/python3.10/site-packages/torch/nn/modules/module.py", line 2153, in load_state_dict
raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format(
RuntimeError: Error(s) in loading state_dict for PeftModelForCausalLM:
size mismatch for base_model.model.model.layers.0.mlp.gate_proj.lora_B.default.weight: copying a param with shape torch.Size([0]) from checkpoint, the shape in current model is torch.Size([14336, 16]).
size mismatch for base_model.model.model.layers.0.mlp.up_proj.lora_B.default.weight: copying a param with shape torch.Size([0]) from checkpoint, the shape in current model is torch.Size([14336, 16]).
I suspect it might be the PEFT config I'm using during DPO training. In the DPO training command of the mistral 7b SFT model, if I use the same PEFT config as that used for DPO-training LLAMA2 7b SFT model, then I don't have this size mismatch
issue in the adaptor merge.
Here's the PEFT config used for DOP-training LLAMA 2 7b SFT model:
peft_config = LoraConfig(
r=script_args.lora_r,
lora_alpha=script_args.lora_alpha,
lora_dropout=script_args.lora_dropout,
target_modules=[
"q_proj",
"v_proj",
"k_proj",
"out_proj",
"fc_in",
"fc_out",
"wte",
],
bias="none",
task_type="CAUSAL_LM",
)
print(f"peft_config: {peft_config}")
What peft_config should I use to DPO-train a Mistral 7b SFT model? Can I use the same PEFT config as is used for DPO-training a LLAMA2 7b model (as pasted above)?
@younesbelkada It would be very helpful if you kindly share your thoughts on this.
Hi @tamanna-mostafa I am going to cc @pacman100 as he is more familiar than I am with respect to interactions between DeepSpeed and PEFT
Gentle ping @pacman100
I am facing similar error. I am training Openhermes 7b model using Peft LORA. Below is the code I am using-
It would be great if someone could tell me what is wrong with this.
@pacman100
import torch
from peft import PeftModel
from transformers import AutoModelForCausalLM, AutoTokenizer
import argparse
def merge_lora_to_base(model_name_or_path, path_to_lora_checkpoint, _merge_model_path):
model_name = model_name_or_path
adapters_name = path_to_lora_checkpoint
print(f"Starting to load the model {model_name} into memory")
model = AutoModelForCausalLM.from_pretrained(
model_name,
torch_dtype=torch.float16,
trust_remote_code=True,
)
print("Loading Tokenizer...")
tokenizer = AutoTokenizer.from_pretrained(model_name,
trust_remote_code=True,
use_fast=True)
print("Successfully loaded model and tokenizer into memory")
print("Resizing model...")
model.resize_token_embeddings(len(tokenizer))
print("Loading Adaptors...")
model = PeftModel.from_pretrained(model, adapters_name)
print("Merging model...")
model = model.merge_and_unload()
print(f"Saving merged model to {save_merge_model_path}...")
model.save_pretrained(save_merge_model_path)
print(f"Saving tokenizer to {save_merge_model_path}...")
tokenizer.save_pretrained(save_merge_model_path)
print("Merging Done")
return model, tokenizer
if __name__ == "__main__":
"""
Cli usage-
python merger_lora_non_quantize.py --model_name_or_path teknium/OpenHermes-2.5-Mistral-7B \
--path_to_lora_checkpoint training_results/OpenHermes/checkpoint-100 \
--save_merge_model_path OpenHermes-2.5-Mistral-7B-merged-lora-non-quantized
"""
parser = argparse.ArgumentParser(description='Merge LoRA adaptors to base model')
parser.add_argument('--model_name_or_path', type=str, default='teknium/OpenHermes-2.5-Mistral-7B',
help='Path to the base model or model name')
parser.add_argument('--path_to_lora_checkpoint', type=str,
default='training_results/OpenHermes-2.5-Mistral-7B/checkpoint-100',
help='Path to the LoRA checkpoint')
parser.add_argument('--save_merge_model_path', type=str,
default='merged_models/OpenHermes-2.5-Mistral-7B-merged-lora-non-quantized',
help='Path to save the merged model')
args = parser.parse_args()
save_merge_model_path = args.save_merge_model_path
if "/" in save_merge_model_path:
pass
else:
save_merge_model_path = f"merged_models/{save_merge_model_path}/"
model, tokenizer = merge_lora_to_base(args.model_name_or_path,
args.path_to_lora_checkpoint,
args.save_merge_model_path,
)
Error-
Starting to load the model teknium/OpenHermes-2.5-Mistral-7B into memory
Loading checkpoint shards: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:01<00:00, 1.05it/s]
Loading Tokenizer...
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
Successfully loaded model and tokenizer into memory
Resizing model...
Loading Adaptors...
Traceback (most recent call last):
File "/home/user/deployment/chatbot/scripts/merger_lora_non_quantize.py", line 63, in <module>
model, tokenizer = merge_lora_to_base(args.model_name_or_path,
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/user/deployment/chatbot/scripts/merger_lora_non_quantize.py", line 27, in merge_lora_to_base
model = PeftModel.from_pretrained(model, adapters_name)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/user/deployment/venv/lib/python3.11/site-packages/peft/peft_model.py", line 388, in from_pretrained
model.load_adapter(model_id, adapter_name, is_trainable=is_trainable, **kwargs)
File "/home/user/deployment/venv/lib/python3.11/site-packages/peft/peft_model.py", line 839, in load_adapter
adapters_weights = load_peft_weights(model_id, device=torch_device, **hf_hub_download_kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/user/deployment/venv/lib/python3.11/site-packages/peft/utils/save_and_load.py", line 326, in load_peft_weights
adapters_weights = safe_load_file(filename, device=device)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/user/deployment/venv/lib/python3.11/site-packages/safetensors/torch.py", line 308, in load_file
with safe_open(filename, framework="pt", device=device) as f:
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
safetensors_rust.SafetensorError: Error while deserializing header: InvalidHeaderDeserialization
Hey @dipeshpaulsystango could you open a separate issue? Thanks
Hey @dipeshpaulsystango could you open a separate issue? Thanks
It's resolved here: https://github.com/huggingface/peft/issues/1599#issuecomment-2025306718
Thanks! Had not seen it there
System Info
transformers
version: 4.35.2Who can help?
@SunMarc @muellerzr
Information
Tasks
examples
folder (such as GLUE/SQuAD, ...)Reproduction
docker run
command on the DPO output to host the model on docker so I can run inferences.Expected behavior
Expected behavior was that docker will start running. However, I got this error instead: