JavaScript BPE Tokenizer Encoder Decoder for OpenAI's GPT-2 / GPT-3 / GPT-4. Port of OpenAI's tiktoken with additional features.
gpt-tokenizer is a highly optimized Token Byte Pair Encoder/Decoder for all OpenAI's models (including those used by GPT-2, GPT-3, GPT-3.5 and GPT-4). It's written in TypeScript, and is fully compatible with all modern JavaScript environments.
As of 2023, it is the most feature-complete, open-source GPT tokenizer on NPM.
No global cache (no accidental memory leaks, as with the original GPT-3-Encoder implementation)
Historical note: This package started off as a fork of latitudegames/GPT-3-Encoder, but version 2.0 was rewritten from scratch.
Currently
gpt-3-encoder
is used:But it might make more sense to use a library that also supports GPT-4 as well, for example:
Currently
gpt-3-encoder
is referenced in a few places:See Also