tamanna-mostafa commented 10 months ago

System Info

transformers version: 4.35.2
Platform: Linux-5.15.0-1050-aws-x86_64-with-glibc2.31
Python version: 3.10.12
Huggingface_hub version: 0.20.2
Safetensors version: 0.4.1
Accelerate version: 0.26.1
Accelerate config: not found
PyTorch version (GPU?): 2.1.2+cu121 (True)
Tensorflow version (GPU?): not installed (NA)
Flax version (CPU?/GPU?/TPU?): not installed (NA)
Jax version: not installed
JaxLib version: not installed

Who can help?

@gante @Rocketknight1 @muellerzr and @pacman100

Information

[ ] The official example scripts
[ ] My own modified scripts

Tasks

[ ] An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
[X] My own task or dataset (give details below)

Reproduction

I ran supervised fine tuning of Mistral 7b model (with 32k preference data)
I ran DPO on the output of SFT
I ran the following code to load the DPO model and run docker:

model=/data/DPO_output_mistral_32k
volume=/mnt/efs/data/tammosta/files_t:/data
num_shard=8
docker run --gpus all --shm-size 1g -p 172.31.8.218:80:80 -v $volume ghcr.io/huggingface/text-generation-inference:1.1.0 --model-id $model --num-shard $num_shard --max-input-length 4095 --max-total-tokens 12000

However, the docker run failed with the following error:

OSError: /data/DPO_output_mistral_32k does not appear to have a file named config.json. Checkout 'https://huggingface.co//data/DPO_output_mistral_32k/None' for available files.

Assuming I need to merge the lora adaptors while loading the model, I ran the following command (the content of the script is also given below):

python merge_peft_adaptors_gpu.py --base_model_name_or_path /mnt/efs/data/tammosta/files_t/output_sft_32k --peft_model_path /mnt/efs/data/tammosta/files_t/DPO_output_mistral_32k --output_dir /mnt/efs/data/tammosta/files_t/DPO_output_mistral_32k_merged --safe_serialization

Here is the content of merge_peft_adaptors_gpu.py:

from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel
import torch

import os
import argparse

def get_args():
    parser = argparse.ArgumentParser()
    parser.add_argument("--base_model_name_or_path", type=str)
    parser.add_argument("--peft_model_path", type=str)
    parser.add_argument("--output_dir", type=str)
    parser.add_argument("--device", type=str, default="auto")
    parser.add_argument("--safe_serialization", action="store_true")

    return parser.parse_args()
####
def main():
    args = get_args()

    if args.device == 'auto':
        device_arg = { 'device_map': 'auto' }
    else:
        device_arg = { 'device_map': { "": args.device} }

    print(f"Loading base model: {args.base_model_name_or_path}")
    base_model = AutoModelForCausalLM.from_pretrained(
        args.base_model_name_or_path,
        return_dict=True,
        torch_dtype=torch.float16,
        trust_remote_code=True,
       **device_arg
    )
    #device = torch.device('cpu')
    #base_model.to(device)

    print(f"Loading PEFT: {args.peft_model_path}")
    model = PeftModel.from_pretrained(base_model, args.peft_model_path)
    print("Peft Model : ", model.device)
    print(f"Running merge_and_unload")
    model = model.merge_and_unload()

    tokenizer = AutoTokenizer.from_pretrained(args.base_model_name_or_path)

    model.save_pretrained(f"{args.output_dir}",max_shard_size='9GB',safe_serialization=args.safe_serialization)
    tokenizer.save_pretrained(f"{args.output_dir}",max_shard_size='9GB',safe_serialization=args.safe_serialization)
    print(f"Model saved to {args.output_dir}")
####
if __name__ == "__main__" :
    main()

However, I'm getting this error:

Loading base model: /mnt/efs/data/tammosta/files_t/output_sft_32k
Loading checkpoint shards: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████| 3/3 [00:04<00:00,  1.40s/it]
Loading PEFT: /mnt/efs/data/tammosta/files_t/DPO_output_mistral_32k
Traceback (most recent call last):
  File "/mnt/efs/data/tammosta/scripts_hb/merge_peft_adaptors_gpu.py", line 51, in <module>
    main()
  File "/mnt/efs/data/tammosta/scripts_hb/merge_peft_adaptors_gpu.py", line 38, in main
    model = PeftModel.from_pretrained(base_model, args.peft_model_path)
  File "/opt/conda/envs/ml_v4/lib/python3.10/site-packages/peft/peft_model.py", line 352, in from_pretrained
    model.load_adapter(model_id, adapter_name, is_trainable=is_trainable, **kwargs)
  File "/opt/conda/envs/ml_v4/lib/python3.10/site-packages/peft/peft_model.py", line 689, in load_adapter
    adapters_weights = load_peft_weights(model_id, device=torch_device, **hf_hub_download_kwargs)
  File "/opt/conda/envs/ml_v4/lib/python3.10/site-packages/peft/utils/save_and_load.py", line 270, in load_peft_weights
    adapters_weights = safe_load_file(filename, device=device)
  File "/opt/conda/envs/ml_v4/lib/python3.10/site-packages/safetensors/torch.py", line 308, in load_file
    with safe_open(filename, framework="pt", device=device) as f:
safetensors_rust.SafetensorError: Error while deserializing header: InvalidHeaderDeserialization

Any idea why I'm getting this error?

Expected behavior

The merged model will successfully load in the output directory.

gante commented 10 months ago

Hi @tamanna-mostafa 👋

You'll have to help us figure out what's wrong: can you get us a short and reproducible script that showcases the issue on the transformers size? I see two exceptions in your pasted code, one about text-generation-inference and another about safetensors

tamanna-mostafa commented 10 months ago

@gante Thanks for your comments. Here are the codes I ran (please let me know if you need any further details):

#Config for SFT
mistral-7b-sft-MM-RLAIF:
  dtype: bf16
  log_dir: "mistral-7b-sft-MM-PS"
  learning_rate: 2e-5
  model_name: /mnt/efs/workspace/sakhaki/models/Mistral-7B-v0.1
  deepspeed_config: configs/zero_config_sft_65b.json #configs/zero_config_pretrain.json
  output_dir: /mnt/efs/data/tammosta/files_t/output_sft_32k
  weight_decay: 0.01
  max_length: 4096
  warmup_steps: 100
  gradient_checkpointing: true
  gradient_accumulation_steps: 8
  per_device_train_batch_size: 1
  per_device_eval_batch_size: 1
  eval_steps: 500000
  save_steps: 100
  num_train_epochs: 2
  save_total_limit: 4
  use_flash_attention: false
  residual_dropout: 0.0
  residual_dropout_lima: true
  save_strategy: steps
  peft_model: false
  only_last_turn_loss: false
  use_custom_sampler: true
  datasets:
    - sft-custom:
        data_files: /mnt/efs/data/tammosta/files_t/SFT_inp_26787_RBS_plus_Optima.json
        #fraction : 0.75
        max_val_set: 300
        val_split: 0.0001
    - oasst_export:
        lang: "bg,ca,cs,da,de,en,es,fr,hr,hu,it,nl,pl,pt,ro,ru,sl,sr,sv,uk" # sft-8.0
        hf_dataset_name: OpenAssistant/oasst1
        fraction : 0.5
        val_split: 0.0001
        max_val_set: 300
        top_k: 1
#run SFT on mistral 7b model
deepspeed trainer_sft_d.py --configs mistral-7b-sft-MM-RLAIF  --wandb-entity tammosta  --show_dataset_stats --deepspeed

#Run DPO on the SFT model
accelerate launch --config_file ./accelerate_configs/ds_zero3.yaml rlhf_dpo.py \
--model_name_or_path="/mnt/efs/data/tammosta/files_t/output_sft_32k" \
--output_dir="/mnt/efs/data/tammosta/files_t/DPO_output_mistral_32k" \
--data_path="/mnt/efs/data/tammosta/files_t/DPO_data_rbs_clean_AIF.json" \
--use_lamma2_peft_config False \
--beta 0.1 \
--optimizer_type adamw_hf \
--learning_rate 1e-6 \
--warmup_steps 50 \
--per_device_train_batch_size 1 \
--per_device_eval_batch_size 1 \
--gradient_accumulation_steps 8 \
--lora_alpha 16 \
--lora_dropout 0.05 \
--lora_r 8 \
--max_prompt_length 2048 \
--max_length 4096 \
--num_train_epochs 4 \
--logging_steps 20 \
--save_steps 100 \
--save_total_limit 8 \
--eval_steps 50 \
--gradient_checkpointing True \
--report_to "wandb"

Contents of the DPO output folder

ubuntu@ip-172-31-8-218:/mnt/efs/data/tammosta/files_t/DPO_output_mistral_32k$ ls 
README.md            adapter_model.safetensors  checkpoint-100  checkpoint-300  checkpoint-500  checkpoint-700    global_step736  special_tokens_map.json  tokenizer.model        training_args.bin
adapter_config.json  added_tokens.json          checkpoint-200  checkpoint-400  checkpoint-600  final_checkpoint  latest          tokenizer.json           tokenizer_config.json  zero_to_fp32.py

# merge the lora adaptors
python merge_peft_adaptors_gpu.py --base_model_name_or_path /mnt/efs/data/tammosta/files_t/output_sft_32k --peft_model_path /mnt/efs/data/tammosta/files_t/DPO_output_mistral_32k --output_dir /mnt/efs/data/tammosta/files_t/DPO_output_mistral_32k_merged --safe_serialization

#Content of merge_peft_adators_gpu.py

from transformers import AutoModelForCausalLM, AutoTokenizer
from peft import PeftModel
import torch

import os
import argparse

def get_args():
    parser = argparse.ArgumentParser()
    parser.add_argument("--base_model_name_or_path", type=str)
    parser.add_argument("--peft_model_path", type=str)
    parser.add_argument("--output_dir", type=str)
    parser.add_argument("--device", type=str, default="auto")
    parser.add_argument("--safe_serialization", action="store_true")

    return parser.parse_args()
####
def main():
    args = get_args()

    if args.device == 'auto':
        device_arg = { 'device_map': 'auto' }
    else:
        device_arg = { 'device_map': { "": args.device} }

    print(f"Loading base model: {args.base_model_name_or_path}")
    base_model = AutoModelForCausalLM.from_pretrained(
        args.base_model_name_or_path,
        return_dict=True,
        torch_dtype=torch.float16,
        trust_remote_code=True,
       **device_arg
    )
    #device = torch.device('cpu')
    #base_model.to(device)

    print(f"Loading PEFT: {args.peft_model_path}")
    model = PeftModel.from_pretrained(base_model, args.peft_model_path)
    print("Peft Model : ", model.device)
    print(f"Running merge_and_unload")
    model = model.merge_and_unload()

    tokenizer = AutoTokenizer.from_pretrained(args.base_model_name_or_path)

    model.save_pretrained(f"{args.output_dir}",max_shard_size='9GB',safe_serialization=args.safe_serialization)
    tokenizer.save_pretrained(f"{args.output_dir}",max_shard_size='9GB',safe_serialization=args.safe_serialization)
    print(f"Model saved to {args.output_dir}")
####
if __name__ == "__main__" :
    main()

#The error I get while running the code above 

Loading base model: /mnt/efs/data/tammosta/files_t/output_sft_32k
Loading checkpoint shards: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████| 3/3 [00:04<00:00,  1.40s/it]
Loading PEFT: /mnt/efs/data/tammosta/files_t/DPO_output_mistral_32k
Traceback (most recent call last):
  File "/mnt/efs/data/tammosta/scripts_hb/merge_peft_adaptors_gpu.py", line 51, in <module>
    main()
  File "/mnt/efs/data/tammosta/scripts_hb/merge_peft_adaptors_gpu.py", line 38, in main
    model = PeftModel.from_pretrained(base_model, args.peft_model_path)
  File "/opt/conda/envs/ml_v4/lib/python3.10/site-packages/peft/peft_model.py", line 352, in from_pretrained
    model.load_adapter(model_id, adapter_name, is_trainable=is_trainable, **kwargs)
  File "/opt/conda/envs/ml_v4/lib/python3.10/site-packages/peft/peft_model.py", line 689, in load_adapter
    adapters_weights = load_peft_weights(model_id, device=torch_device, **hf_hub_download_kwargs)
  File "/opt/conda/envs/ml_v4/lib/python3.10/site-packages/peft/utils/save_and_load.py", line 270, in load_peft_weights
    adapters_weights = safe_load_file(filename, device=device)
  File "/opt/conda/envs/ml_v4/lib/python3.10/site-packages/safetensors/torch.py", line 308, in load_file
    with safe_open(filename, framework="pt", device=device) as f:
safetensors_rust.SafetensorError: Error while deserializing header: InvalidHeaderDeserialization

gante commented 9 months ago

Hi @tamanna-mostafa 👋 looking at your stack trace, it looks like a peft error, you should open an issue there :)

tamanna-mostafa commented 9 months ago

@gante Issue opened here: https://github.com/huggingface/peft/issues/1443