Error count token when using 16k model

NextAlone commented 1 year ago

Hi, welcome to chat with GPT. Type `/help` to display available commands.
Traceback (most recent call last):
  File "/Users/xxx/.local/bin/chat", line 7, in <module>
    main()
  File "/Users/xxx/Repos/chatgpt-in-terminal/gpt_term/main.py", line 1125, in main
    chat_gpt = ChatGPT(api_key, api_timeout)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/xxx/Repos/chatgpt-in-terminal/gpt_term/main.py", line 117, in __init__
    self.current_tokens = count_token(self.messages)
                          ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/xxx/Repos/chatgpt-in-terminal/gpt_term/main.py", line 616, in count_token
    encoding = tiktoken.get_encoding("cl100k_base")
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/xxx/Repos/chatgpt-in-terminal/.venv/lib/python3.11/site-packages/tiktoken/registry.py", line 63, in get_encoding
    enc = Encoding(**constructor())
                     ^^^^^^^^^^^^^
  File "/Users/xxx/Repos/chatgpt-in-terminal/.venv/lib/python3.11/site-packages/tiktoken_ext/openai_public.py", line 64, in cl100k_base
    mergeable_ranks = load_tiktoken_bpe(
                      ^^^^^^^^^^^^^^^^^^
  File "/Users/xxx/Repos/chatgpt-in-terminal/.venv/lib/python3.11/site-packages/tiktoken/load.py", line 117, in load_tiktoken_bpe
    return {
           ^
  File "/Users/xxx/Repos/chatgpt-in-terminal/.venv/lib/python3.11/site-packages/tiktoken/load.py", line 119, in <dictcomp>
    for token, rank in (line.split() for line in contents.splitlines() if line)
        ^^^^^^^^^^^
ValueError: not enough values to unpack (expected 2, got 1)

NextAlone commented 1 year ago

Solved by https://github.com/openai/tiktoken/issues/114

Ace-Radom commented 1 year ago

I cannot reproduce this error, can you maybe give a brief introduction how this error occured? (e.g. what did you do, your config settings, logs, etc) btw, my local tiktoken module version is also 0.3.3 but the error reported in this issue and tiktoken#114 didn't occur (using gpt-3.5-turbo-16k model), but it's also possible that this error doesn't appear all the times, idk. The easiest way to solve this problem is to set a min-version in requirements.txt, but idk if it's really necessary.

NextAlone commented 1 year ago

> pip list
Package            Version
------------------ ---------
aiohttp            3.8.4
aiosignal          1.3.1
async-timeout      4.0.2
attrs              22.2.0
certifi            2022.12.7
charset-normalizer 3.1.0
frozenlist         1.3.3
idna               3.4
markdown-it-py     2.2.0
mdurl              0.1.2
multidict          6.0.4
openai             0.27.2
packaging          23.1
pip                22.3.1
prompt-toolkit     3.0.38
Pygments           2.14.0
pyperclip          1.8.2
python-dotenv      1.0.0
python-i18n        0.3.9
PyYAML             6.0
regex              2023.3.23
requests           2.31.0
rich               13.4.2
setuptools         65.6.3
sseclient-py       1.7.2
tiktoken           0.4.0
tqdm               4.65.0
urllib3            1.26.15
wcwidth            0.2.6
yarl               1.8.2

[DEFAULT]
openai_api_key = sk-xxx
openai_api_timeout = 30
auto_generate_title = True
chat_save_perfix = ./chat_history_
log_level = INFO

Reproduced, with TIKTOKEN_CACHE_DIR not set, and no matter which model.

tiktoken version is 0.3.x before, and after this error occured, i upgraded it to 0.4.0.

NextAlone commented 1 year ago

Though, reproduced with 0.3.3. 0.4.0, the same.

Ace-Radom commented 1 year ago

Weird, tested on win11 and linux (debian 11.6), I still cannot reproduce this error. If the suggestion given in the issue opened under tiktoken repo which is

If a bad file has gotten cached, gaojing8500's suggestion works as does setting the environment variable TIKTOKEN_CACHE_DIR=''

does work, there's maybe no need to set a min-version-require like tiktoken>=0.4.0 in requirements. Anyway I will notice this possible error in README.

xiaoxx970 / chatgpt-in-terminal

Error count token when using 16k model #57