Closed flexwang closed 7 months ago
tiktoken.encode('') will return an empty list. However, for huggingface tokenizer, encode an empty string will yield eos_token. Is there a way to make tiktoken behave the same way?
tiktoken.encode('')
eos_token
tiktoken
i added a contidion where if the text input is empty then it's gonna return the eos_token
Use tiktoken.encode('') or [eos_token]
tiktoken.encode('') or [eos_token]
tiktoken.encode('')
will return an empty list. However, for huggingface tokenizer, encode an empty string will yieldeos_token
. Is there a way to maketiktoken
behave the same way?