and then use it to calculate the tokens of the generated text, like
num_output_tokens = get_token_length(gen_text)
However, tokenzier will add special tokens during encoding phrase, here is an example
from transformers import AutoTokenizer
tokenizer = AutoTokenizer.from_pretrained("meta-llama/Llama-2-7b-hf")
text = "hello world"
enc = tokenizer.encode(text)
print(enc) # [1, 22172, 3186]
dec = tokenizer.decode(enc)
print(dec) # <s> hello world
The token 1 should not be counted as one generated token here.
BTW, different models may have different special token rules, so it's hard to determine an easy way to get the real token number of the generated text.
In https://github.com/ray-project/llmperf/blob/main/token_benchmark_ray.py#L62, you have
and then use it to calculate the tokens of the generated text, like
However, tokenzier will add special tokens during encoding phrase, here is an example
The token
1
should not be counted as onegenerated
token here.BTW, different models may have different special token rules, so it's hard to determine an easy way to get the real token number of the generated text.