hiyouga / LLaMA-Factory

Unified Efficient Fine-Tuning of 100+ LLMs (ACL 2024)
https://arxiv.org/abs/2403.13372
Apache License 2.0
33.08k stars 4.07k forks source link

Yi-6B-200K 微调后,merge model 时卡住 #2188

Closed Tendo33 closed 8 months ago

Tendo33 commented 9 months ago

Reminder

Reproduction

训练脚本:

CUDA_VISIBLE_DEVICES=2 python src/train_bash.py \
    --stage sft \
    --model_name_or_path /workspace/share_data/base_llms/Yi-6B-200K \
    --do_train \
    --dataset gw_train_zhengwen2 \
    --template yi \
    --finetuning_type lora \
    --lora_alpha 512 \
    --lora_rank 256 \
    --lora_target all \
    --output_dir /workspace/sunjinfeng/github_projet/LLaMA-Factory/yi_baseline \
    --overwrite_output_dir \
    --overwrite_cache \
    --per_device_train_batch_size 2 \
    --gradient_accumulation_steps 1 \
    --max_length 200000 \
    --lr_scheduler_type cosine \
    --logging_steps 5 \
    --save_steps 1000 \
    --learning_rate 1e-4 \
    --num_train_epochs 2 \
    --warmup_ratio 0.05 \
    --weight_decay 0.01 \
    --plot_loss \
    --report_to wandb \
    --export_legacy_format True \
    --bf16 True

模型融合脚本:

python src/export_model.py \
    --model_name_or_path /workspace/share_data/base_llms/Yi-6B-200K \
    --finetuning_type lora \
    --template yi \
    --adapter_name_or_path /workspace/sunjinfeng/github_projet/LLaMA-Factory/yi_baseline \
    --export_dir /workspace/share_data/ft_llms/yi-200k-baseline \
    --export_size 2 \
    --export_legacy_format False

Expected behavior

(llm) ➜  LLaMA-Factory git:(main) ✗ bash merge_model.sh
[2024-01-15 06:45:56,490] [INFO] [real_accelerator.py:158:get_accelerator] Setting ds_accelerator to cuda (auto detect)
[INFO|tokenization_utils_base.py:2024] 2024-01-15 06:45:58,216 >> loading file tokenizer.model
[INFO|tokenization_utils_base.py:2024] 2024-01-15 06:45:58,216 >> loading file added_tokens.json
[INFO|tokenization_utils_base.py:2024] 2024-01-15 06:45:58,216 >> loading file special_tokens_map.json
[INFO|tokenization_utils_base.py:2024] 2024-01-15 06:45:58,216 >> loading file tokenizer_config.json
[INFO|tokenization_utils_base.py:2024] 2024-01-15 06:45:58,216 >> loading file tokenizer.json
[INFO|configuration_utils.py:737] 2024-01-15 06:45:58,362 >> loading configuration file /workspace/share_data/base_llms/Yi-6B-200K/config.json
[INFO|configuration_utils.py:802] 2024-01-15 06:45:58,364 >> Model config LlamaConfig {
  "_name_or_path": "/workspace/share_data/base_llms/Yi-6B-200K",
  "architectures": [
    "LlamaForCausalLM"
  ],
  "attention_bias": false,
  "attention_dropout": 0.0,
  "bos_token_id": 1,
  "eos_token_id": 2,
  "hidden_act": "silu",
  "hidden_size": 4096,
  "initializer_range": 0.02,
  "intermediate_size": 11008,
  "max_position_embeddings": 200000,
  "model_type": "llama",
  "num_attention_heads": 32,
  "num_hidden_layers": 32,
  "num_key_value_heads": 4,
  "pad_token_id": 0,
  "pretraining_tp": 1,
  "rms_norm_eps": 1e-05,
  "rope_scaling": null,
  "rope_theta": 5000000.0,
  "tie_word_embeddings": false,
  "torch_dtype": "bfloat16",
  "transformers_version": "4.36.2",
  "use_cache": true,
  "vocab_size": 64000
}

[INFO|modeling_utils.py:3341] 2024-01-15 06:45:58,384 >> loading weights file /workspace/share_data/base_llms/Yi-6B-200K/pytorch_model.bin.index.json
[INFO|modeling_utils.py:1341] 2024-01-15 06:45:58,384 >> Instantiating LlamaForCausalLM model under default dtype torch.bfloat16.
[INFO|configuration_utils.py:826] 2024-01-15 06:45:58,385 >> Generate config GenerationConfig {
  "bos_token_id": 1,
  "eos_token_id": 2,
  "pad_token_id": 0
}

Loading checkpoint shards: 100%|██████| 2/2 [00:05<00:00,  2.78s/it]
[INFO|modeling_utils.py:4185] 2024-01-15 06:46:09,875 >> All model checkpoint weights were used when initializing LlamaForCausalLM.

[INFO|modeling_utils.py:4193] 2024-01-15 06:46:09,875 >> All the weights of LlamaForCausalLM were initialized from the model checkpoint at /workspace/share_data/base_llms/Yi-6B-200K.
If your task is similar to the task the model of the checkpoint was trained on, you can already use LlamaForCausalLM for predictions without further training.
[INFO|configuration_utils.py:779] 2024-01-15 06:46:09,878 >> loading configuration file /workspace/share_data/base_llms/Yi-6B-200K/generation_config.json
[INFO|configuration_utils.py:826] 2024-01-15 06:46:09,878 >> Generate config GenerationConfig {
  "bos_token_id": 1,
  "eos_token_id": 2,
  "pad_token_id": 0
}

01/15/2024 06:46:09 - INFO - llmtuner.model.adapter - Fine-tuning method: LoRA

System Info

(llm) ➜  LLaMA-Factory git:(main) ✗ transformers-cli env      

Copy-and-paste the text below in your GitHub issue and FILL OUT the two last points.

- `transformers` version: 4.36.2
- Platform: Linux-4.19.72-300.el7.x86_64-x86_64-with-glibc2.31
- Python version: 3.9.18
- Huggingface_hub version: 0.19.4
- Safetensors version: 0.4.1
- Accelerate version: 0.24.1
- Accelerate config:    not found
- PyTorch version (GPU?): 2.1.0+cu121 (True)
- Tensorflow version (GPU?): not installed (NA)
- Flax version (CPU?/GPU?/TPU?): not installed (NA)
- Jax version: not installed
- JaxLib version: not installed
- Using GPU in script?: <fill in>
- Using distributed or parallel set-up in script?: <fill in>

Others

模型融合时就卡在这里,没有报错也不往下进行了

Tendo33 commented 9 months ago

不进行融合,分开推理也是卡住

CUDA_VISIBLE_DEVICES=1 python src/api_demo.py \
    --model_name_or_path /workspace/share_data/base_llms/Yi-6B-200K \
    --adapter_name_or_path /workspace/sunjinfeng/github_projet/LLaMA-Factory/yi_baseline \
    --template yi \
    --finetuning_type lora \
    --max_new_tokens 200000 \
    --temperature 0.95 \
    --top_k 50 \
    --top_p 0.95 \
    --repetition_penalty 1.2
Tendo33 commented 9 months ago

internlm2-base-7b 模型融合也特别慢

chazzhou commented 8 months ago

I am experiencing a similar problem with the finetuned LLaMA2-7B model. The model loading gets stuck during inference or when running the merge script at llmtuner.model.adapter - Fine-tuning method: LoRA. This issue appears to be related to a high LoRa rank. Will try out different settings and report back.