使用SFT后的模型推理时出现报错，麻烦答主帮帮忙看下！

Describe the bug

Please provide a clear and concise description of what the bug is. If applicable, add screenshots to help explain your problem, especially for visualization related problems.

我SFT使用参数是 set CUDA_VISIBLE_DEVICES=1 && /lustre/home/acct-phyyjl/phyyjl-xzhr/.conda/envs/LLM/bin/python supervised_finetuning.py --model_type llama --model_name_or_path /lustre/home/acct-phyyjl/phyyjl-xzhr/Desktop/models_hf_LLAMA/7B-chat --train_file_dir ./data/finetune/train --validation_file_dir ./data/finetune/test --per_device_train_batch_size 4 --per_device_eval_batch_size 4 --do_train --do_eval --use_peft True --fp16 --max_train_samples -1 --max_eval_samples -1 --num_train_epochs 3 --learning_rate 2e-5 --warmup_ratio 0.05 --weight_decay 0.05 --logging_strategy steps --logging_steps 10 --eval_steps 500 --evaluation_strategy steps --save_steps 500 --save_strategy steps --save_total_limit 300 --gradient_accumulation_steps 1 --preprocessing_num_workers 4 --output_dir /lustre/home/acct-phyyjl/phyyjl-xzhr/Desktop/models_hf_LLAMA/7B-chat-SFT --overwrite_output_dir --ddp_timeout 30000 --logging_first_step True --target_modules all --lora_rank 8 --lora_alpha 16 --lora_dropout 0.05 --torch_dtype bfloat16 --device_map auto --report_to tensorboard --ddp_find_unused_parameters False --gradient_checkpointing True --cache_dir ./cache

Merge 后，使用 inference.py 推理，出现报错。

（PS，我在对 Pertrain 之后的 SFT模型时没有报错。loss 也正常）

同时出现了 loss 为 0 的问题：

shibing624 / MedicalGPT

使用SFT后的模型推理时出现报错，麻烦答主帮帮忙看下！ #267

Describe the bug