Closed Tendo33 closed 8 months ago
不进行融合,分开推理也是卡住
CUDA_VISIBLE_DEVICES=1 python src/api_demo.py \
--model_name_or_path /workspace/share_data/base_llms/Yi-6B-200K \
--adapter_name_or_path /workspace/sunjinfeng/github_projet/LLaMA-Factory/yi_baseline \
--template yi \
--finetuning_type lora \
--max_new_tokens 200000 \
--temperature 0.95 \
--top_k 50 \
--top_p 0.95 \
--repetition_penalty 1.2
internlm2-base-7b 模型融合也特别慢
I am experiencing a similar problem with the finetuned LLaMA2-7B model. The model loading gets stuck during inference or when running the merge script at llmtuner.model.adapter - Fine-tuning method: LoRA
. This issue appears to be related to a high LoRa rank. Will try out different settings and report back.
Reminder
Reproduction
训练脚本:
模型融合脚本:
Expected behavior
System Info
Others
模型融合时就卡在这里,没有报错也不往下进行了