taishan1994 / Llama3.1-Finetuning

对llama3进行全参微调、lora微调以及qlora微调。
Apache License 2.0
149 stars 15 forks source link

LLaMA-3.1-8B-Instruct 使用 “--bf16 True” 报错 #11

Closed MichaelCaohn closed 3 days ago

MichaelCaohn commented 4 days ago

你好,我有尝试使用 提供的 finetune_qlora_llama3_8B_chat.sh去finetune 从huggingface下载的官方的3.1-8B-Instruct 模型。

运行的时候,模型可以正常的load,但是在训练的时候有 报以下错误:

  File "/home/miniconda3/envs/llama3/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1538, in _call_impl
  File "/home/miniconda3/envs/llama3/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1538, in _call_impl
    loss = self.module(*inputs, **kwargs)
  File "/home/miniconda3/envs/llama3/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1538, in _call_impl
    result = forward_call(*args, **kwargs)
  File "/home/miniconda3/envs/llama3/lib/python3.8/site-packages/transformers/models/llama/modeling_llama.py", line 912, in forward
    result = forward_call(*args, **kwargs)
  File "/home/miniconda3/envs/llama3/lib/python3.8/site-packages/transformers/models/llama/modeling_llama.py", line 912, in forward
    causal_mask = self._update_causal_mask(
  File "/home/miniconda3/envs/llama3/lib/python3.8/site-packages/transformers/models/llama/modeling_llama.py", line 1036, in _update_causal_mask
    causal_mask = self._update_causal_mask(
  File "/home/miniconda3/envs/llama3/lib/python3.8/site-packages/transformers/models/llama/modeling_llama.py", line 1036, in _update_causal_mask
    result = forward_call(*args, **kwargs)
  File "/home/miniconda3/envs/llama3/lib/python3.8/site-packages/peft/peft_model.py", line 918, in forward
    causal_mask = torch.triu(causal_mask, diagonal=1)
RuntimeError: "triu_tril_cuda_template" not implemented for 'BFloat16'
    causal_mask = torch.triu(causal_mask, diagonal=1)
RuntimeError: "triu_tril_cuda_template" not implemented for 'BFloat16'
    return self.base_model(
  File "/home/miniconda3/envs/llama3/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1538, in _call_impl
    result = forward_call(*args, **kwargs)
  File "/home/miniconda3/envs/llama3/lib/python3.8/site-packages/peft/tuners/tuners_utils.py", line 94, in forward
    return self.model.forward(*args, **kwargs)
  File "/home/miniconda3/envs/llama3/lib/python3.8/site-packages/transformers/models/llama/modeling_llama.py", line 1139, in forward
    outputs = self.model(
  File "/home/miniconda3/envs/llama3/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1538, in _call_impl
    result = forward_call(*args, **kwargs)
  File "/home/miniconda3/envs/llama3/lib/python3.8/site-packages/transformers/models/llama/modeling_llama.py", line 912, in forward
    causal_mask = self._update_causal_mask(
  File "/home/miniconda3/envs/llama3/lib/python3.8/site-packages/transformers/models/llama/modeling_llama.py", line 1036, in _update_causal_mask
    causal_mask = torch.triu(causal_mask, diagonal=1)
RuntimeError: "triu_tril_cuda_template" not implemented for 'BFloat16'

我所安装的环境是这些 (和README中提供的版本一样):

torch==2.0.1
transformers==4.43.1
deepspeed==0.9.4
accelerate==0.33.0
peft==0.5.0
numpy==1.20.0
jinja2==3.1.3
flash-attn==2.5.6
datasets==2.18.0
modelscope==1.13.3
pydantic==1.10.6
bitsandbytes==0.43.0

在网上查了一下好像是 因为torch==2.0.1 并不支持 BFloat16: https://github.com/meta-llama/llama3/issues/80; https://github.com/meta-llama/llama3/issues/110

训练脚本是这个:

NCCL_P2P_DISABLE=1 \
NCCL_IB_DISABLE=1 \
CUDA_VISIBLE_DEVICES=0,1,2,3 \
torchrun \
--nproc_per_node 4 \
--nnodes 1 \
--node_rank 0 \
--master_addr localhost \
--master_port 6601 \
../finetune_llama3.py \
--model_name_or_path "/home/llm/models/llama3.1/Meta-Llama-3.1-8B-Instruct" \
--data_path "../data/Belle_sampled_qwen.json" \
--output_dir "../output/llama3_8B_lora" \
--num_train_epochs 100 \
--per_device_train_batch_size 1 \
--per_device_eval_batch_size 1 \
--gradient_accumulation_steps 8 \
--evaluation_strategy "no" \
--save_strategy "steps" \
--save_steps 5 \
--save_total_limit 1 \
--learning_rate 1e-5 \
--weight_decay 0.1 \
--adam_beta2 0.95 \
--warmup_ratio 0.01 \
--lr_scheduler_type "cosine" \
--logging_steps 1 \
--report_to "none" \
--model_max_length 512 \
--gradient_checkpointing True \
--lazy_preprocess True \
--deepspeed "../config/ds_config_zero3_72B.json" \
--use_lora \
--bf16 True

我有尝试把pytorch版本更新到2.1.0,但是好像flash-attn又不支持。

如果把finetune_qlora_llama3_8B_chat.sh 里面的--bf16 True 拿掉就会解决这个问题。

请问一下,如果还想使用--bf16 True 的情况下,这个问题RuntimeError: "triu_tril_cuda_template" not implemented for 'BFloat16' 有什么解决办法吗?

taishan1994 commented 3 days ago

暂时没什么好办法,可以使用fp16

MichaelCaohn commented 3 days ago

非常感谢!