openai / tiktoken

tiktoken is a fast BPE tokeniser for use with OpenAI's models.
MIT License
12k stars 818 forks source link

Is there a way for tiktoken to interoperate better with offline AI software? #232

Open ParetoOptimalDev opened 9 months ago

ParetoOptimalDev commented 9 months ago

For instance there are bug reports from users trying to run software in offline only mode, but because those libraries use tiktoken and it goes out to download vocab files, those users get an error like:

In that last issue for example the issue was:

  File "/home/tony/installs/privateGPT/.venv/lib/python3.11/site-packages/tiktoken_ext/openai_public.py", line 11, in gpt2
    mergeable_ranks = data_gym_to_mergeable_bpe_ranks(
                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/tony/installs/privateGPT/.venv/lib/python3.11/site-packages/tiktoken/load.py", line 82, in data_gym_to_mergeable_bpe_ranks
    vocab_bpe_contents = read_file_cached(vocab_bpe_file).decode()
                         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Perhaps tiktoken could respect an environmental variable like OFFLINE similar to TERM=dumb for terminals and throw an error of vocab file.xyz not present, not downloading because OFFLINE=1 environmental variable set?

Thanks!

jinmingyi1998 commented 9 months ago

Same question

how to use it offline

jinmingyi1998 commented 9 months ago

https://stackoverflow.com/questions/76106366/how-to-use-tiktoken-in-offline-mode-computer

I found this

ForkInABlender commented 6 months ago

https://stackoverflow.com/questions/76106366/how-to-use-tiktoken-in-offline-mode-computer

I found this

That solution works. Tested it myself.

Thank you for finding it.