Closed AEProgrammer closed 9 months ago
报错信息如下:
Loading checkpoint shards: 0%| | 0/29 [00:00<?, ?it/s]
Traceback (most recent call last):
File "/code/liuhui67/LLM_finetune/src/cli_demo.py", line 47, in
同样的错误
合并时候指定 --export_legacy_format
Reminder
Reproduction
微调参数 CUDA_VISIBLE_DEVICES=0 python src/train_bash.py \ --stage sft \ --do_train \ --model_name_or_path qwen/Qwen-14B-Chat \ --dataset mofang_qa \ --template default \ --finetuning_type lora \ --lora_target c_attn \ --output_dir /code/liuhui67/LLM_finetune/lora_model_dir/lora_qwen_14b_v1 \ --overwrite_cache \ --per_device_train_batch_size 10 \ --gradient_accumulation_steps 4 \ --lr_scheduler_type cosine \ --logging_steps 10 \ --save_steps 100 \ --learning_rate 5e-5 \ --num_train_epochs 3.0 \ --plot_loss \ --flash_attn \ --save_safetensors False
合并参数 python src/export_model.py \ --model_name_or_path /root/.cache/modelscope/hub/qwen/Qwen-14B-Chat \ --adapter_name_or_path /code/liuhui67/LLM_finetune/lora_model_dir/lora_qwen_14b_v1/tmp-checkpoint-100\ --template defalut \ --finetuning_type lora \ --export_dir /code/liuhui67/LLM_finetune/merged_model/merged_qwen_14b_v1
推理参数 python src/cli_demo.py \ --model_name_or_path /code/liuhui67/LLM_finetune/merged_model/merged_qwen_14b_v1 \ --template default \ --finetuning_type lora \
Expected behavior
lora微调qwen14b如果使用了safetensors会报错OSError: No such device (os error 19),不使用safetensors然后合并完权重在推理的时候使用合并完的模型也会报一样的错误,如果在推理的时候提供原始模型和lora模型一起就执行正常。
System Info
transformers
version: 4.36.2Others
No response