This PR fixes a bug in the streamer handling for UTF-8 characters. Prior to this PR, the streamer has an assumption that a replacement character (�) always correspond to an entire token. However, for the Qwen2 model tokenizer, some token can be � if decoded directly, which breaks the assumption and leads to incorrect result generated by the streamer.
This PR fixes this issue with a safer behavior that does not rely on such an assumption.
This PR fixes a bug in the streamer handling for UTF-8 characters. Prior to this PR, the streamer has an assumption that a replacement character (
�
) always correspond to an entire token. However, for the Qwen2 model tokenizer, some token can be�
if decoded directly, which breaks the assumption and leads to incorrect result generated by the streamer.This PR fixes this issue with a safer behavior that does not rely on such an assumption.