Qwen2 lora微调后用llamafactory-cli export命令合并模型推理结果有"assstant: "前缀

jfzleo commented 2 days ago

Reminder

[X] I have read the README and searched the existing issues.

System Info

llamafactory version: 0.8.3.dev0
Platform: Linux-4.18.0-2.4.3.3.kwai.x86_64-x86_64-with-glibc2.31
Python version: 3.10.14
PyTorch version: 2.3.0+cu121 (GPU)
Transformers version: 4.41.2
Datasets version: 2.16.0
Accelerate version: 0.30.1
PEFT version: 0.11.1
TRL version: 0.9.4
GPU type: NVIDIA A800-SXM4-80GB
DeepSpeed version: 0.14.0
vLLM version: 0.4.3

Reproduction

CUDA_VISIBLE_DEVICES=0 llamafactory-cli export \
    --model_name_or_path $MODEL_PATH \
    --adapter_name_or_path $ADAPTER_PATH \
    --template qwen \
    --finetuning_type lora \
    --export_dir $EXPORT_PATH \
    --export_size 2 \
    --export_legacy_format False

使用模型：

model = AutoModelForCausalLM.from_pretrained(
        model_path,
        torch_dtype="auto",
        device_map="auto"
    )

Expected behavior

期望合并后模型在通过AutoModelForCausalLM.from_pretrained()加载后，对输入进行二分类，仅输出”清晰“或”模糊“

Others

输出会随机出现assistant:前缀，与结果间随机通过空格或换行符"\n"分隔，如：

assistant: 清晰
assistant:
模糊

同样的数据，使用同样的代码，load预训练模型Qwen2-7B-Instruct输出格式正常。请问问题可能出在哪里？感谢！

hiyouga commented 2 days ago

训练和推理模板不一致

jfzleo commented 1 day ago

训练和推理模板不一致

请问python脚本中推理模板应该在哪里配置？训练命令：

torchrun $DISTRIBUTED_ARGS src/train.py \
    --deepspeed $DS_CONFIG_PATH \
    --stage sft \
    --do_train \
    --use_fast_tokenizer \
    --flash_attn "auto" \
    --model_name_or_path $MODEL_PATH \
    --dataset $DATASET_NAME \
    --template qwen \
    --finetuning_type lora \
    --lora_target all \
    --output_dir $OUTPUT_PATH \
    --overwrite_cache \
    --overwrite_output_dir \
    --warmup_steps 100 \
    --weight_decay 0.1 \
    --per_device_train_batch_size 4 \
    --gradient_accumulation_steps 4 \
    --ddp_timeout 9000 \
    --learning_rate 5e-5 \
    --lr_scheduler_type cosine \
    --logging_steps 1 \
    --cutoff_len 32768 \
    --save_steps 1000 \
    --plot_loss \
    --num_train_epochs 3 \
    --bf16

load模型：

model = AutoModelForCausalLM.from_pretrained(
        model_path,
        torch_dtype="auto",
        device_map="auto"
    )
tokenizer = AutoTokenizer.from_pretrained(tokenizer_path)

推理python脚本：

text1 = tokenizer.apply_chat_template(
            message1,
            tokenize=False,
            add_generation_prompt=True
        )
model_inputs = tokenizer([text1], return_tensors="pt").to(device)
generated_ids = model.generate(
          model_inputs.input_ids,
          max_new_tokens=max_gen
      )
generated_ids = [
          output_ids[len(input_ids):] for input_ids, output_ids in zip(model_inputs.input_ids, generated_ids)
      ]
response = tokenizer.batch_decode(generated_ids, skip_special_tokens=True)[0]

hiyouga commented 1 day ago

python 推理要设置 eos

jfzleo commented 1 day ago

发现原因来自训练保存的tokenizer_config.json中会修改chat_template pretrained tokenizer_config.json:

{"chat_template": "{% for message in messages %}{% if loop.first and messages[0]['role'] != 'system' %}{{ '<|im_start|>system\nYou are a helpful assistant.<|im_end|>\n' }}{% endif %}{{'<|im_start|>' + message['role'] + '\n' + message['content'] + '<|im_end|>' + '\n'}}{% endfor %}{% if add_generation_prompt %}{{ '<|im_start|>assistant\n' }}{% endif %}"}

sft tokenizer_config.json:

{"chat_template": "{% set system_message = 'You are a helpful assistant.' %}{% if messages[0]['role'] == 'system' %}{% set system_message = messages[0]['content'] %}{% endif %}{% if system_message is defined %}{{ '<|im_start|>system\n' + system_message + '<|im_end|>\n' }}{% endif %}{% for message in messages %}{% set content = message['content'] %}{% if message['role'] == 'user' %}{{ '<|im_start|>user\n' + content + '<|im_end|>\n<|im_start|>assistant\n' }}{% elif message['role'] == 'assistant' %}{{ content + '<|im_end|>' + '\n' }}{% endif %}{% endfor %}"}

推理时load预训练的tokenizer，推理结果正常:

tokenizer = AutoTokenizer.from_pretrained(pretrained_tokenizer_path)

load推理模型后，使用get_template_and_fix_tokenizer推理结果仍有问题：

model = AutoModelForCausalLM.from_pretrained(
    model_path,
    torch_dtype="auto",
    device_map="auto"
)
tokenizer = AutoTokenizer.from_pretrained(tokenizer_path)
get_template_and_fix_tokenizer(tokenizer, "qwen")

请问这是否是个bug？

hiyouga commented 1 day ago

encode 之后的 token id 有区别吗

jfzleo commented 1 day ago

apply_chat_template后的文本和encode之后的token_id都不同。代码：

tokenizer_sft = AutoTokenizer.from_pretrained(sft_path)
tokenizer_pretrained = AutoTokenizer.from_pretrained(pretrained_path)
prompt = "你是对话判断助手"
messages = [
    {"role": "system", "content": prompt}
]
text_sft = tokenizer_sft.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True
)
print(text_sft)
model_inputs = tokenizer([text_sft], return_tensors="pt").to(device)
print(model_inputs)

text_pretrained = tokenizer_pretrained.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True
)
print(text_pretrained)
model_inputs = tokenizer([text_pretrained], return_tensors="pt").to(device)
print(model_inputs)

输出：

<|im_start|>system
你是对话判断助手<|im_end|>

{'input_ids': tensor([[151644,   8948,    198, 105043, 105051, 104317, 110498, 151645,    198]],
       device='cuda:0'), 'attention_mask': tensor([[1, 1, 1, 1, 1, 1, 1, 1, 1]], device='cuda:0')}

<|im_start|>system
你是对话判断助手<|im_end|>
<|im_start|>assistant

{'input_ids': tensor([[151644,   8948,    198, 105043, 105051, 104317, 110498, 151645,    198,
         151644,  77091,    198]], device='cuda:0'), 'attention_mask': tensor([[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]], device='cuda:0')}

hiyouga commented 1 day ago

看起来你没有给出 user message 所以编码不对

jfzleo commented 1 day ago

给出user message后问题解决了，非常感谢！

prompt = "请帮我判断对话是否清晰"
messages = [
    {"role": "system", "content": "你是对话判断助手"},
    {"role": "user", "content": prompt}
]
···

输出

<|im_start|>system
你是对话判断助手<|im_end|>
<|im_start|>user
请帮我判断对话是否清晰<|im_end|>
<|im_start|>assistant

{'input_ids': tensor([[151644,   8948,    198, 105043, 105051, 104317, 110498, 151645,    198,
         151644,    872,    198,  14880, 108965, 104317, 105051,  64471, 104542,
         151645,    198, 151644,  77091,    198]], device='cuda:0'), 'attention_mask': tensor([[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]],
       device='cuda:0')}

<|im_start|>system
你是对话判断助手<|im_end|>
<|im_start|>user
请帮我判断对话是否清晰<|im_end|>
<|im_start|>assistant

{'input_ids': tensor([[151644,   8948,    198, 105043, 105051, 104317, 110498, 151645,    198,
         151644,    872,    198,  14880, 108965, 104317, 105051,  64471, 104542,
         151645,    198, 151644,  77091,    198]], device='cuda:0'), 'attention_mask': tensor([[1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1]],
       device='cuda:0')}

hiyouga / LLaMA-Factory