openai / tiktoken

tiktoken is a fast BPE tokeniser for use with OpenAI's models.
MIT License
12.48k stars 856 forks source link

Encode an empty string gives empty tokens #276

Closed flexwang closed 7 months ago

flexwang commented 7 months ago

tiktoken.encode('') will return an empty list. However, for huggingface tokenizer, encode an empty string will yield eos_token. Is there a way to make tiktoken behave the same way?

pratyakshagarwal commented 7 months ago

i added a contidion where if the text input is empty then it's gonna return the eos_token

hauntsaninja commented 7 months ago

Use tiktoken.encode('') or [eos_token]