niieani / gpt-tokenizer

The fastest JavaScript BPE Tokenizer Encoder Decoder for OpenAI's GPT-2 / GPT-3 / GPT-4 / GPT-4o. Port of OpenAI's tiktoken with additional features.
https://gpt-tokenizer.dev
MIT License
434 stars 35 forks source link

isWithinTokenLimit fails due to lack of model being provided #26

Closed emfastic closed 2 months ago

emfastic commented 1 year ago

Unsure if this is just a documentation issue, however, after checking the source code there appears to be no default or clear instruction on how to provide the model type for tokenization in the isWithinTokenLimit function.

I'll happily open a PR on this but want to know if there's any kind of contribution guidelines.

To reproduce, run as such isWithinTokenLimit(messages, MAX_CHAT_LENGTH) where messages is a ChatMessage iterable and MAX_CHAT_LENGTH is a token length the chain should not be longer than.

JesusMF23 commented 1 year ago

I am having the same issue, I get an error when trying to use isWithinTokenLimit without passing a model_name. even more when I try to use just encodeChat passing the messages and model, I get the following error: TypeError: Cannot read properties of undefined (reading 'get') at Object.encodeChatGenerator (/workspace/node_modules/gpt-tokenizer/cjs/GptEncoding.js:99:57) at encodeChatGenerator.next () at Object.encodeChat (/workspace/node_modules/gpt-tokenizer/cjs/GptEncoding.js:141:25)

royibernthal commented 1 year ago

Joining the issue

JesusMF23 commented 1 year ago

just FYI guys, as an alternative to this package I have been using : https://www.npmjs.com/package/gpt-tokens; https://github.com/Cainier/gpt-tokens

is working fine and offers similar functionalities

royibernthal commented 1 year ago

just FYI guys, as an alternative to this package I have been using : https://www.npmjs.com/package/gpt-tokens; https://github.com/Cainier/gpt-tokens

is working fine and offers similar functionalities

Works well, thanks!

jmelovich commented 1 year ago

I had this issue, I fixed it by adding the model to the import:

const { isWithinTokenLimit } = require('gpt-tokenizer/model/gpt-4-0314');

Hope this helps

niieani commented 2 months ago

In order to select the model for isWithinTokenLimit, you need to import the file that represents correct model / encoding.

github-actions[bot] commented 2 months ago

:tada: This issue has been resolved in version 2.2.1 :tada:

The release is available on:

Your semantic-release bot :package::rocket: