openai / tiktoken

tiktoken is a fast BPE tokeniser for use with OpenAI's models.
MIT License
11.06k stars 749 forks source link

Encode an empty string gives empty tokens #276

Closed flexwang closed 2 months ago

flexwang commented 3 months ago

tiktoken.encode('') will return an empty list. However, for huggingface tokenizer, encode an empty string will yield eos_token. Is there a way to make tiktoken behave the same way?

pratyakshagarwal commented 3 months ago

i added a contidion where if the text input is empty then it's gonna return the eos_token

hauntsaninja commented 2 months ago

Use tiktoken.encode('') or [eos_token]