huggingface / transformers

🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.
https://huggingface.co/transformers
Apache License 2.0
134.32k stars 26.86k forks source link

OSError: /data/DPO_output_mistral_32k does not appear to have a file named config.json. #28688

Closed tamanna-mostafa closed 7 months ago

tamanna-mostafa commented 9 months ago

System Info

Who can help?

@SunMarc @muellerzr

Information

Tasks

Reproduction

  1. Fine-tuned the mistral 7b model with 32k preference data.
  2. Ran DPO on the SFT output.
  3. Ran the docker run command on the DPO output to host the model on docker so I can run inferences.

Expected behavior

Expected behavior was that docker will start running. However, I got this error instead:

2024-01-24T20:31:06.334853Z ERROR text_generation_launcher: Error when initializing model
Traceback (most recent call last):
  File "/opt/conda/bin/text-generation-server", line 8, in <module>
    sys.exit(app())
  File "/opt/conda/lib/python3.9/site-packages/typer/main.py", line 311, in __call__
    return get_command(self)(*args, **kwargs)
  File "/opt/conda/lib/python3.9/site-packages/click/core.py", line 1157, in __call__
    return self.main(*args, **kwargs)
  File "/opt/conda/lib/python3.9/site-packages/typer/core.py", line 778, in main
    return _main(
  File "/opt/conda/lib/python3.9/site-packages/typer/core.py", line 216, in _main
    rv = self.invoke(ctx)
  File "/opt/conda/lib/python3.9/site-packages/click/core.py", line 1688, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/opt/conda/lib/python3.9/site-packages/click/core.py", line 1434, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/opt/conda/lib/python3.9/site-packages/click/core.py", line 783, in invoke
    return __callback(*args, **kwargs)
  File "/opt/conda/lib/python3.9/site-packages/typer/main.py", line 683, in wrapper
    return callback(**use_params)  # type: ignore
  File "/opt/conda/lib/python3.9/site-packages/text_generation_server/cli.py", line 83, in serve
    server.serve(
  File "/opt/conda/lib/python3.9/site-packages/text_generation_server/server.py", line 207, in serve
    asyncio.run(
  File "/opt/conda/lib/python3.9/asyncio/runners.py", line 44, in run
    return loop.run_until_complete(main)
  File "/opt/conda/lib/python3.9/asyncio/base_events.py", line 634, in run_until_complete
    self.run_forever()
  File "/opt/conda/lib/python3.9/asyncio/base_events.py", line 601, in run_forever
    self._run_once()
  File "/opt/conda/lib/python3.9/asyncio/base_events.py", line 1905, in _run_once
    handle._run()
  File "/opt/conda/lib/python3.9/asyncio/events.py", line 80, in _run
    self._context.run(self._callback, *self._args)
> File "/opt/conda/lib/python3.9/site-packages/text_generation_server/server.py", line 159, in serve_inner
    model = get_model(
  File "/opt/conda/lib/python3.9/site-packages/text_generation_server/models/__init__.py", line 129, in get_model
    config_dict, _ = PretrainedConfig.get_config_dict(
  File "/opt/conda/lib/python3.9/site-packages/transformers/configuration_utils.py", line 620, in get_config_dict
    config_dict, kwargs = cls._get_config_dict(pretrained_model_name_or_path, **kwargs)
  File "/opt/conda/lib/python3.9/site-packages/transformers/configuration_utils.py", line 675, in _get_config_dict
    resolved_config_file = cached_file(
  File "/opt/conda/lib/python3.9/site-packages/transformers/utils/hub.py", line 400, in cached_file
    raise EnvironmentError(
OSError: /data/DPO_output_mistral_32k does not appear to have a file named config.json. Checkout 'https://huggingface.co//data/DPO_output_mistral_32k/None' for available files.
amyeroberts commented 9 months ago

Hi @tamanna-mostafa, thanks for raising this issue!

Could you list the files saved under /data/DPO_output_mistral_32k?

tamanna-mostafa commented 9 months ago

@amyeroberts Hi, thanks for your comment. Below is what I see when I run lsin the folder, DPO_output_mistral_32k :

ubuntu@ip-172-31-8-218:/mnt/efs/data/tammosta/files_t/DPO_output_mistral_32k$ ls 
README.md            adapter_model.safetensors  checkpoint-100  checkpoint-300  checkpoint-500  checkpoint-700    global_step736  special_tokens_map.json  tokenizer.model        training_args.bin
adapter_config.json  added_tokens.json          checkpoint-200  checkpoint-400  checkpoint-600  final_checkpoint  latest          tokenizer.json           tokenizer_config.json  zero_to_fp32.py
amyeroberts commented 9 months ago

Could you share how you're loading the model? If you're using adapters, then I'd expect this pattern:

from transformers import AutoModelForCausalLM

model_id = "{MISTRAL_CHECKPOINT}"
dpo_model_id = "/data/DPO_output_mistral_32k"
model = AutoModelForCausalLM.from_pretrained(model_id)
model.load_adapter(dpo_model_id)
tamanna-mostafa commented 9 months ago

You mean how I'm loading the DPO model on docker? Here are the steps:

model=/data/DPO_output_mistral_32k
volume=/mnt/efs/data/tammosta/files_t:/data
num_shard=8
docker run --gpus all --shm-size 1g -p 172.31.8.218:80:80 -v $volume ghcr.io/huggingface/text-generation-inference:1.1.0 --model-id $model --num-shard $num_shard --max-input-length 4095 --max-total-tokens 12000

Just in case, this is the command I ran for the DPO training:

accelerate launch --config_file ./accelerate_configs/ds_zero3.yaml rlhf_dpo.py \
--model_name_or_path="/mnt/efs/data/tammosta/files_t/output_sft_32k" \
--output_dir="/mnt/efs/data/tammosta/files_t/DPO_output_mistral_32k" \
--data_path="/mnt/efs/data/tammosta/files_t/DPO_data_rbs_clean_AIF.json" \
--use_lamma2_peft_config False \
--beta 0.1 \
--optimizer_type adamw_hf \
--learning_rate 1e-6 \
--warmup_steps 50 \
--per_device_train_batch_size 1 \
--per_device_eval_batch_size 1 \
--gradient_accumulation_steps 8 \
--lora_alpha 16 \
--lora_dropout 0.05 \
--lora_r 8 \
--max_prompt_length 2048 \
--max_length 4096 \
--num_train_epochs 4 \
--logging_steps 20 \
--save_steps 100 \
--save_total_limit 8 \
--eval_steps 50 \
--gradient_checkpointing True \
--report_to "wandb"
tamanna-mostafa commented 9 months ago

It uses adapters in DPO training.

tamanna-mostafa commented 9 months ago

Is there anything wrong in the way I'm loading the model on docker?

amyeroberts commented 9 months ago

@tamanna-mostafa No, I don't think so. From the current error and the files in the model repo I currently think there's two possible causes:

For a model with adapter weights, I'd expect the adapter weights repo to look something like this: https://huggingface.co/ybelkada/opt-350m-lora/tree/main

Could you share the contents of the adapter_config.json?

tamanna-mostafa commented 9 months ago

@amyeroberts

How the model is being loaded in the text_generation_server package

In my understanding, I used the below steps to load the model (prior to running docker):

model=/data/DPO_output_mistral_32k
volume=/mnt/efs/data/tammosta/files_t:/data

Here is the contents of the adapter_config.json:

{
  "alpha_pattern": {},
  "auto_mapping": null,
  "base_model_name_or_path": "/mnt/efs/data/tammosta/files_t/output_sft_32k",
  "bias": "none",
  "fan_in_fan_out": false,
  "inference_mode": true,
  "init_lora_weights": true,
  "layers_pattern": null,
  "layers_to_transform": null,
  "loftq_config": {},
  "lora_alpha": 16.0,
  "lora_dropout": 0.05,
  "megatron_config": null,
  "megatron_core": "megatron.core",
  "modules_to_save": null,
  "peft_type": "LORA",
  "r": 8,
  "rank_pattern": {},
  "revision": null,
  "target_modules": [
    "v_proj",
    "q_proj",
    "up_proj",
    "down_proj",
    "gate_proj",
    "o_proj",
    "k_proj"
  ],
  "task_type": "CAUSAL_LM"
tamanna-mostafa commented 9 months ago

I'm also pasting the last 3 sections from the rlhf_dpo.py script.

 # 5. initialize the DPO trainer
    dpo_trainer = DPOTrainer(
        model,
        model_ref,
        args=training_args,
        beta=script_args.beta,
        train_dataset=train_dataset,
        eval_dataset=eval_dataset,
        tokenizer=tokenizer,
        peft_config=peft_config,
        max_prompt_length=script_args.max_prompt_length,
        max_length=script_args.max_length,
    )

    # 6. train
    dpo_trainer.train()
    dpo_trainer.save_model(script_args.output_dir)

    # 7. save
    output_dir = os.path.join(script_args.output_dir, "final_checkpoint")
    dpo_trainer.model.save_pretrained(output_dir)
tamanna-mostafa commented 9 months ago

@amyeroberts
Hi, did you have a chance to take a look? thanks

amyeroberts commented 9 months ago

I'm going cc in @younesbelkada here, who knows more about the DPO trainer and expected values in the configs :)

younesbelkada commented 9 months ago

Hi @tamanna-mostafa Thanks for the issue! in order to run the trained adapter with TGI using Docker, you need to first merge the adapter weights into the base model, and push / save the merged weights somewhere either on the Hub or locally.

By merging the adapter weights you make sure to convert the trained model into a standalone transformers model so that it becomes compatible with TGI. Please see: https://huggingface.co/docs/peft/main/en/conceptual_guides/lora#merge-lora-weights-into-the-base-model to understand what merging means.

To merge the model, run:

from peft import AutoPeftModelForCausalLM

model = AutoPeftModelForCausalLM.from_pretrained(model_id)
model = model.merge_and_unload()
# at this point the  model is a standalone transformers model
model.push_to_hub(xxx)
tamanna-mostafa commented 9 months ago

Hi @younesbelkada

Thanks a lot for your comment. As the base model, I used mistral 7b that I fine-tuned with my own preference data. I ran the following code to merge:

from transformers import AutoModelForCausalLM
from peft import PeftModel
import torch

#base_model = "/mnt/efs/data/tammosta/files_t/output_sft_32k"
base_model = AutoModelForCausalLM.from_pretrained(
        "/mnt/efs/data/tammosta/files_t/output_sft_32k",
        return_dict=True,
        torch_dtype=torch.float16,
        trust_remote_code=True,
       #**device_arg
    )
peft_model_id = "/mnt/efs/data/tammosta/files_t/DPO_output_32k_Test/final_checkpoint"
model = PeftModel.from_pretrained(base_model, peft_model_id)
merged_model = model.merge_and_unload()
merged_model.save_pretrained("/mnt/efs/data/tammosta/files_t/DPO_output_mistral_32k_merged")

However, I'm getting the following error:

Loading checkpoint shards: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████| 3/3 [00:01<00:00,  2.77it/s]
Traceback (most recent call last):
  File "/mnt/efs/data/tammosta/files_t/merge_peft_tammosta.py", line 14, in <module>
    model = PeftModel.from_pretrained(base_model, peft_model_id)
  File "/opt/conda/envs/ml_v4/lib/python3.10/site-packages/peft/peft_model.py", line 354, in from_pretrained
    model.load_adapter(model_id, adapter_name, is_trainable=is_trainable, **kwargs)
  File "/opt/conda/envs/ml_v4/lib/python3.10/site-packages/peft/peft_model.py", line 698, in load_adapter
    load_result = set_peft_model_state_dict(self, adapters_weights, adapter_name=adapter_name)
  File "/opt/conda/envs/ml_v4/lib/python3.10/site-packages/peft/utils/save_and_load.py", line 241, in set_peft_model_state_dict
    load_result = model.load_state_dict(peft_model_state_dict, strict=False)
  File "/opt/conda/envs/ml_v4/lib/python3.10/site-packages/torch/nn/modules/module.py", line 2152, in load_state_dict
    raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format(
RuntimeError: Error(s) in loading state_dict for PeftModelForCausalLM:
        size mismatch for base_model.model.model.layers.0.mlp.gate_proj.lora_B.default.weight: copying a param with shape torch.Size([0]) from checkpoint, the shape in current model is torch.Size([14336, 16]).
        size mismatch for base_model.model.model.layers.0.mlp.up_proj.lora_B.default.weight: copying a param with shape torch.Size([0]) from checkpoint, the shape in current model is torch.Size([14336, 16]).
        size mismatch for base_model.model.model.layers.0.mlp.down_proj.lora_A.default.weight: copying a param with shape torch.Size([0]) from checkpoint, the shape in current model is torch.Size([16, 14336]).
        size mismatch for base_model.model.model.layers.1.mlp.gate_proj.lora_B.default.weight: copying a param with shape torch.Size([0]) from checkpoint, the shape in current model is torch.Size([14336, 16]).
    . . .

It looks the weights in the base model doesn't match with that in the PEFT model. Could you please suggest a possible way of debugging this?

younesbelkada commented 9 months ago

@tamanna-mostafa have you used DeepSpeed to train your adapters by any chance?

younesbelkada commented 9 months ago

Can you also try:

from transformers import AutoModelForCausalLM
from peft import AutoPeftModelForCausalLM
import torch

peft_model_id = "/mnt/efs/data/tammosta/files_t/DPO_output_32k_Test/final_checkpoint"
model = AutoPeftModelForCausalLM.from_pretrained(peft_model_id, torch_dtype=torch.float16,)
model = model.merge_and_unload()
model.save_pretrained("/mnt/efs/data/tammosta/files_t/DPO_output_mistral_32k_merged")
tamanna-mostafa commented 9 months ago

Hi @younesbelkada , I used DeepSpeedto fine tune the mistral 7b (the base model). I used accelerate launch to train the DPO model (PEFT model). When I run the suggested code, I get:

Loading checkpoint shards: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████| 3/3 [00:01<00:00,  2.64it/s]
Traceback (most recent call last):
  File "/mnt/efs/data/tammosta/files_t/merge_peft_tammosta_2.py", line 6, in <module>
    model = AutoPeftModelForCausalLM.from_pretrained(peft_model_id, torch_dtype=torch.float16,)
  File "/opt/conda/envs/ml_v4/lib/python3.10/site-packages/peft/auto.py", line 115, in from_pretrained
    tokenizer_exists = file_exists(
  File "/opt/conda/envs/ml_v4/lib/python3.10/site-packages/huggingface_hub/utils/_validators.py", line 110, in _inner_fn
    validate_repo_id(arg_value)
  File "/opt/conda/envs/ml_v4/lib/python3.10/site-packages/huggingface_hub/utils/_validators.py", line 158, in validate_repo_id
    raise HFValidationError(
huggingface_hub.utils._validators.HFValidationError: Repo id must be in the form 'repo_name' or 'namespace/repo_name': '/mnt/efs/data/tammosta/files_t/DPO_output_32k_Test/final_checkpoint'. Use `repo_type` argument if needed.
younesbelkada commented 9 months ago

@tamanna-mostafa it seems that behavior is a duplicate of https://github.com/huggingface/peft/issues/1430 - can you try to pass a relative path instead and run the script from the final checkpoint folder ? I'll submit a fix on PEFT

tamanna-mostafa commented 9 months ago

Using the relative path, I've the issue of size mismatch:

(ml_v2) ubuntu@ip-172-31-32-104:/mnt/efs/data/tammosta/files_t$ python hf_test_2.py
Loading checkpoint shards: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████| 3/3 [00:01<00:00,  2.18it/s]
Traceback (most recent call last):
  File "/mnt/efs/data/tammosta/files_t/hf_test_2.py", line 6, in <module>
    model = AutoPeftModelForCausalLM.from_pretrained(peft_model_id, torch_dtype=torch.float16,)
  File "/opt/conda/envs/ml_v2/lib/python3.10/site-packages/peft/auto.py", line 127, in from_pretrained
    return cls._target_peft_class.from_pretrained(
  File "/opt/conda/envs/ml_v2/lib/python3.10/site-packages/peft/peft_model.py", line 354, in from_pretrained
    model.load_adapter(model_id, adapter_name, is_trainable=is_trainable, **kwargs)
  File "/opt/conda/envs/ml_v2/lib/python3.10/site-packages/peft/peft_model.py", line 698, in load_adapter
    load_result = set_peft_model_state_dict(self, adapters_weights, adapter_name=adapter_name)
  File "/opt/conda/envs/ml_v2/lib/python3.10/site-packages/peft/utils/save_and_load.py", line 241, in set_peft_model_state_dict
    load_result = model.load_state_dict(peft_model_state_dict, strict=False)
  File "/opt/conda/envs/ml_v2/lib/python3.10/site-packages/torch/nn/modules/module.py", line 2153, in load_state_dict
    raise RuntimeError('Error(s) in loading state_dict for {}:\n\t{}'.format(
RuntimeError: Error(s) in loading state_dict for PeftModelForCausalLM:
        size mismatch for base_model.model.model.layers.0.mlp.gate_proj.lora_B.default.weight: copying a param with shape torch.Size([0]) from checkpoint, the shape in current model is torch.Size([14336, 16]).
        size mismatch for base_model.model.model.layers.0.mlp.up_proj.lora_B.default.weight: copying a param with shape torch.Size([0]) from checkpoint, the shape in current model is torch.Size([14336, 16]).

I suspect it might be the PEFT config I'm using during DPO training. In the DPO training command of the mistral 7b SFT model, if I use the same PEFT config as that used for DPO-training LLAMA2 7b SFT model, then I don't have this size mismatch issue in the adaptor merge.

Here's the PEFT config used for DOP-training LLAMA 2 7b SFT model:

peft_config = LoraConfig(
            r=script_args.lora_r,
            lora_alpha=script_args.lora_alpha,
            lora_dropout=script_args.lora_dropout,
            target_modules=[
                "q_proj",
                "v_proj",
                "k_proj",
                "out_proj",
                "fc_in",
                "fc_out",
                "wte",
            ],
            bias="none",
            task_type="CAUSAL_LM",
        )
        print(f"peft_config: {peft_config}")

What peft_config should I use to DPO-train a Mistral 7b SFT model? Can I use the same PEFT config as is used for DPO-training a LLAMA2 7b model (as pasted above)?

tamanna-mostafa commented 9 months ago

@younesbelkada It would be very helpful if you kindly share your thoughts on this.

younesbelkada commented 8 months ago

Hi @tamanna-mostafa I am going to cc @pacman100 as he is more familiar than I am with respect to interactions between DeepSpeed and PEFT

amyeroberts commented 7 months ago

Gentle ping @pacman100

dipeshpaulsystango commented 7 months ago

I am facing similar error. I am training Openhermes 7b model using Peft LORA. Below is the code I am using-

It would be great if someone could tell me what is wrong with this.

@pacman100

import torch
from peft import PeftModel
from transformers import AutoModelForCausalLM, AutoTokenizer
import argparse

def merge_lora_to_base(model_name_or_path, path_to_lora_checkpoint, _merge_model_path):
    model_name = model_name_or_path
    adapters_name = path_to_lora_checkpoint

    print(f"Starting to load the model {model_name} into memory")
    model = AutoModelForCausalLM.from_pretrained(
        model_name,
        torch_dtype=torch.float16,
        trust_remote_code=True,
    )
    print("Loading Tokenizer...")
    tokenizer = AutoTokenizer.from_pretrained(model_name,
                                              trust_remote_code=True,
                                              use_fast=True)
    print("Successfully loaded model and tokenizer into memory")

    print("Resizing model...")
    model.resize_token_embeddings(len(tokenizer))

    print("Loading Adaptors...")
    model = PeftModel.from_pretrained(model, adapters_name)

    print("Merging model...")
    model = model.merge_and_unload()

    print(f"Saving merged model to {save_merge_model_path}...")
    model.save_pretrained(save_merge_model_path)
    print(f"Saving tokenizer to {save_merge_model_path}...")
    tokenizer.save_pretrained(save_merge_model_path)
    print("Merging Done")
    return model, tokenizer

if __name__ == "__main__":
    """
    Cli usage- 
    python merger_lora_non_quantize.py --model_name_or_path teknium/OpenHermes-2.5-Mistral-7B \
    --path_to_lora_checkpoint training_results/OpenHermes/checkpoint-100 \
    --save_merge_model_path OpenHermes-2.5-Mistral-7B-merged-lora-non-quantized
    """
    parser = argparse.ArgumentParser(description='Merge LoRA adaptors to base model')
    parser.add_argument('--model_name_or_path', type=str, default='teknium/OpenHermes-2.5-Mistral-7B',
                        help='Path to the base model or model name')
    parser.add_argument('--path_to_lora_checkpoint', type=str,
                        default='training_results/OpenHermes-2.5-Mistral-7B/checkpoint-100',
                        help='Path to the LoRA checkpoint')
    parser.add_argument('--save_merge_model_path', type=str,
                        default='merged_models/OpenHermes-2.5-Mistral-7B-merged-lora-non-quantized',
                        help='Path to save the merged model')

    args = parser.parse_args()
    save_merge_model_path = args.save_merge_model_path
    if "/" in save_merge_model_path:
        pass
    else:
        save_merge_model_path = f"merged_models/{save_merge_model_path}/"
    model, tokenizer = merge_lora_to_base(args.model_name_or_path,
                                          args.path_to_lora_checkpoint,
                                          args.save_merge_model_path,
                                          )

Error-

Starting to load the model teknium/OpenHermes-2.5-Mistral-7B into memory
Loading checkpoint shards: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:01<00:00,  1.05it/s]
Loading Tokenizer...
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
Successfully loaded model and tokenizer into memory
Resizing model...
Loading Adaptors...
Traceback (most recent call last):
  File "/home/user/deployment/chatbot/scripts/merger_lora_non_quantize.py", line 63, in <module>
    model, tokenizer = merge_lora_to_base(args.model_name_or_path,
                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/user/deployment/chatbot/scripts/merger_lora_non_quantize.py", line 27, in merge_lora_to_base
    model = PeftModel.from_pretrained(model, adapters_name)
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/user/deployment/venv/lib/python3.11/site-packages/peft/peft_model.py", line 388, in from_pretrained
    model.load_adapter(model_id, adapter_name, is_trainable=is_trainable, **kwargs)
  File "/home/user/deployment/venv/lib/python3.11/site-packages/peft/peft_model.py", line 839, in load_adapter
    adapters_weights = load_peft_weights(model_id, device=torch_device, **hf_hub_download_kwargs)
                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/user/deployment/venv/lib/python3.11/site-packages/peft/utils/save_and_load.py", line 326, in load_peft_weights
    adapters_weights = safe_load_file(filename, device=device)
                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/user/deployment/venv/lib/python3.11/site-packages/safetensors/torch.py", line 308, in load_file
    with safe_open(filename, framework="pt", device=device) as f:
         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
safetensors_rust.SafetensorError: Error while deserializing header: InvalidHeaderDeserialization
ArthurZucker commented 7 months ago

Hey @dipeshpaulsystango could you open a separate issue? Thanks

Dipeshpal commented 7 months ago

Hey @dipeshpaulsystango could you open a separate issue? Thanks

It's resolved here: https://github.com/huggingface/peft/issues/1599#issuecomment-2025306718

ArthurZucker commented 7 months ago

Thanks! Had not seen it there