Open masahi opened 9 months ago
Looks like this token is actually a "prefix_space" (SPIECE_UNDERLINE
) with index 29871 in llama tokenizer vocabulary. There was some discussion in transformers repository about the tokenizer's behavior with this token (link) but seems that model itself can generate it
I have an idea to workaround it: 1. Greedy case: for prefill output if top1 token is 29871 replace it by top2 token, we observed that it is the next token (but it should be double checked). 2. Random case: for prefill output if token 29871 in top tokens not use it and replaces by the next after top token set.
Oh could this simply be a matter of setting skip_special_tokens=True
here?
https://github.com/octoml/mlc-llm/blob/batch-serving/serve/mlc_serve/engine/engine_common.py#L79
@sunggg Any reason we are using skip_special_tokens=False
in detokenize_incrementally
?
I thought about it briefly and decided to follow the default setting in vllm since I do not know about its other impacts. https://github.com/vllm-project/vllm/blob/main/vllm/transformers_utils/tokenizer.py#L191
It seems, as of https://github.com/octoml/mlc-llm/pull/107 which introduced
detokenize_incrementally
from vllm, very often (or always?) we get a blank token at the beginning of each generation like this:Apparently, vllm has the same problem. Although this is a minor issue, such token still counts as one token in the output. So we should fix this behavior.