openai / tiktoken

tiktoken is a fast BPE tokeniser for use with OpenAI's models.
MIT License
11.61k stars 785 forks source link

Replace `<|endoftext|>` with `ENDOFTEXT` #186

Closed alvarobartt closed 11 months ago

alvarobartt commented 11 months ago

Hi to whoever is reading this! 🤗

What's in this PR?

In the GPT2 configuration, the special_tokens include <|endoftext|>, while the pre-defined constant should be used instead, defined some lines above. So on, this PR just replaces <|endoftext|> with the constant ENDOFTEXT.

hauntsaninja commented 11 months ago

Thanks!