waylaidwanderer / node-chatgpt-api

A client implementation for ChatGPT and Bing AI. Available as a Node.js module, REST API server, and CLI app.
https://www.npmjs.com/package/@waylaidwanderer/chatgpt-api
MIT License
4.2k stars 737 forks source link

Invalid value for allowed_special #259

Open qinhongwei123 opened 1 year ago

qinhongwei123 commented 1 year ago

Describe the bug Error: Invalid value for allowed_special at module.exports.__wbindgen_error_new (/api/node_modules/@dqbd/tiktoken/dist/node/_tiktoken.js:414:17) at wasm://wasm/00fdfe62:wasm-function[29]:0x2acbd at wasm://wasm/00fdfe62:wasm-function[171]:0x5403e at Tiktoken.encode (/api/node_modules/@dqbd/tiktoken/dist/node/_tiktoken.js:268:18) at ChatGPTClient.getTokenCount (file:///api/node_modules/@waylaidwanderer/chatgpt-api/src/ChatGPTClient.js:423:32) at file:///api/node_modules/@waylaidwanderer/chatgpt-api/src/ChatGPTClient.js:438:36 at Array.map () at ChatGPTClient.getTokenCountForMessage (file:///api/node_modules/@waylaidwanderer/chatgpt-api/src/ChatGPTClient.js:436:61) at ChatGPTClient.buildPrompt (file:///api/node_modules/@waylaidwanderer/chatgpt-api/src/ChatGPTClient.js:357:38) at ChatGPTClient.sendMessage (file:///api/node_modules/@waylaidwanderer/chatgpt-api/src/ChatGPTClient.js:251:34)

To Reproduce Steps to reproduce the behavior: exception occurs when many people ask prompt request

Expected behavior success return answer

Screenshots If applicable, add screenshots to help explain your problem.

Node.js version (please complete the following information):

Package version (please complete the following information):

waylaidwanderer commented 1 year ago

@qinhongwei123 Can you see if the issue still occurs on the latest version?

danny-avila commented 1 year ago

Hi @waylaidwanderer this is still happening as of 1.36.0, I've only had it reported when multiple people are using it, too, with regular text inputs.

I think I have a simple fix, as highlighted here: https://github.com/dqbd/tiktoken/issues/35

I just have to write a test script to make sure I can reproduce it with repeated use, then apply the change

danny-avila commented 1 year ago

So I couldn't reproduce the exact issue, but I noticed that having the tokenizer cached is much more resource intensive, utilizing my CPU at 100% for 5000 calls, which takes 3 seconds. Memory usage fluctuates a lot before garbage collection, and remains on the higher end before memory is freed.

If the tokenizer is freed, as many examples of tiktoken indicate to do, tokenizing is much slower as it initializes each time, but CPU usage remains steady at 33%~. 5000 encodings take 4 min 41 seconds, roughly 18 encodings per second.

Using tiktoken lite with the original methods only nets a small decrease in memory usage during the test, and everything else remains the same from my first paragraph.

I think some middle ground is necessary to scale use of the ChatGPT client, if we are to tokenize multiple times per message, or have a method to bypass tokenizing altogether (although it is pretty tightly coupled with how we're handling prompts). I suspect not handling this may be the cause of the issue, as Users report that restarting the server will solve the issue (until it happens again). although, the issue may be when users call the tokenizer all at once, rather than needing to free it every so often.

I still need to test the behavior of freeing the tokenizer (removing it from cache) every so often, or every X requests.

I'm still thinking what would be a good solution here.

waylaidwanderer commented 1 year ago

One method to reduce the amount of times we need to tokenize text would be to store the token count for each message as well, maybe...

waylaidwanderer commented 1 year ago

In regards to solving the original issue, maybe we can just call free when the error happens?

danny-avila commented 1 year ago

I think the issue may be twofold, not exactly memory related, but maybe something wonky with the tokenizer never being freed. The other issue may be that tokenizing a lot of text at once utilizes too much cpu (100% on my machine CPU 11th Gen Intel(R) Core(TM) i7-11800H @ 2.30GHz). The impact on memory is actually not huge (spikes are not greater than 20 mb and less than 10 mb on average).

I've halved the cpu consumption when I free the tokenizer and reset the tokenizer cache every 25 encodings. The slowdown is pretty generous still tbh, as my machine is able to encode 16,000,000 tokens in 1 minute (roughly 267 getTokenCount calls for a 1000 token text per second), which is far above the rate limits anyway (60 RPM 60,000 TPM).

That approach might work well!

danny-avila commented 1 year ago

I can run some tests on a free tier EC2 instance to test in a more constrained environment, as my machine can't reproduce the exact issue anyway

danny-avila commented 1 year ago

In regards to solving the original issue, maybe we can just call free when the error happens?

After sleeping on it, I think we should keep it simple and try this. And if the issue comes up again, we could try freeing more frequently.