[Bug] When I enable "<|im_end|>" as stop_str in qwen2 configuration, the final output seems to be truncated.

mlc-ai / mlc-llm

Universal LLM Deployment Engine with ML Compilation

https://llm.mlc.ai/

Apache License 2.0

19.17k stars 1.58k forks source link

[Bug] When I enable "<|im_end|>" as stop_str in qwen2 configuration, the final output seems to be truncated. #2868

Closed Moxoo closed 4 weeks ago

Moxoo commented 2 months ago

🐛 Bug

To Reproduce

Steps to reproduce the behavior:

1.Do not set <|im_end|>. Of course, my fine-tuned qwen2 model will output <im_end>. But the problem is that it is not a separate token (Id 151645 is ), it is a token together with the last part (} in JSON) of my expected output,In my case it is }<im Oq21CehHkD

2.Then set <|im_end|>. The last part of my expected output is stopped along with <im_end>, so the final output was missing the } u8EDSp6R4v

Expected behavior

Environment

Platform :CUDA
Operating system :Ubuntu
Device :PC+Tesla V100
How you installed MLC-LLM (conda, source):source
How you installed TVM-Unity (pip, source):source
Python version (e.g. 3.10):3.11.8
GPU driver version (if applicable):535.54.03
CUDA/cuDNN version (if applicable):12.1
TVM Unity Hash Tag :69190c360cd5ce1c4a35c0f49501c96993fae416
Any other relevant information:

Additional context

Moxoo commented 2 months ago

MasterJH5574 commented 2 months ago

@Moxoo Thank you for reporting! For the issue of “missing }”, are you running with the JSON response_format? Or you run with the normal text response format (if you didn't manually specify the JSON format then it's the text format). I just want to get more context here on how we can reproduce the issue.

By the way, The py file https://github.com/mlc-ai/mlc-llm/blob/23094e76e33684e19380d77afd1fe521df47a8fb/python/mlc_llm/conversation_template/qwen2.py#L17C42-L17C43, in line:17,It should probably be stop_str=["<|endoftext|>", "<|im_end|>"], instead of stop_str=["<|endoftext|>, <|im_end|>"],

Thank you so much for catching this!

Moxoo commented 2 months ago

@Moxoo Thank you for reporting! For the issue of “missing }”, are you running with the JSON response_format? Or you run with the normal text response format (if you didn't manually specify the JSON format then it's the text format). I just want to get more context here on how we can reproduce the issue.

By the way, The py file https://github.com/mlc-ai/mlc-llm/blob/23094e76e33684e19380d77afd1fe521df47a8fb/python/mlc_llm/conversation_template/qwen2.py#L17C42-L17C43, in line:17,It should probably be stop_str=["<|endoftext|>", "<|im_end|>"], instead of stop_str=["<|endoftext|>, <|im_end|>"],

Thank you so much for catching this!

Thanks for your reply. I didn't set response_format. I found that this problem did not appear on the native qwen2 model even after quantization. I think it should be that fine-tuning changed the prediction behavior of the model. I will continue to look for the reason. Thank you.

MasterJH5574 commented 2 months ago

@Moxoo Thanks for getting back. Definitely let us know if you see further issues.