bug of counting output tokens

In https://github.com/ray-project/llmperf/blob/main/token_benchmark_ray.py#L62, you have

get_token_length = lambda text: len(tokenizer.encode(text))

and then use it to calculate the tokens of the generated text, like

num_output_tokens = get_token_length(gen_text)

However, tokenzier will add special tokens during encoding phrase, here is an example

from transformers import AutoTokenizer

tokenizer = AutoTokenizer.from_pretrained("meta-llama/Llama-2-7b-hf")

text = "hello world"

enc = tokenizer.encode(text)
print(enc) # [1, 22172, 3186]

dec = tokenizer.decode(enc)
print(dec) # <s> hello world

The token 1 should not be counted as one generated token here.

BTW, different models may have different special token rules, so it's hard to determine an easy way to get the real token number of the generated text.

ray-project / llmperf

bug of counting output tokens #35