openai / tiktoken

tiktoken is a fast BPE tokeniser for use with OpenAI's models.
MIT License
12.52k stars 856 forks source link

Is there a way for tiktoken to interoperate better with offline AI software? #232

Open ParetoOptimalDev opened 11 months ago

ParetoOptimalDev commented 11 months ago

For instance there are bug reports from users trying to run software in offline only mode, but because those libraries use tiktoken and it goes out to download vocab files, those users get an error like:

In that last issue for example the issue was:

  File "/home/tony/installs/privateGPT/.venv/lib/python3.11/site-packages/tiktoken_ext/openai_public.py", line 11, in gpt2
    mergeable_ranks = data_gym_to_mergeable_bpe_ranks(
                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/tony/installs/privateGPT/.venv/lib/python3.11/site-packages/tiktoken/load.py", line 82, in data_gym_to_mergeable_bpe_ranks
    vocab_bpe_contents = read_file_cached(vocab_bpe_file).decode()
                         ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Perhaps tiktoken could respect an environmental variable like OFFLINE similar to TERM=dumb for terminals and throw an error of vocab file.xyz not present, not downloading because OFFLINE=1 environmental variable set?

Thanks!

jinmingyi1998 commented 10 months ago

Same question

how to use it offline

jinmingyi1998 commented 10 months ago

https://stackoverflow.com/questions/76106366/how-to-use-tiktoken-in-offline-mode-computer

I found this

ForkInABlender commented 8 months ago

https://stackoverflow.com/questions/76106366/how-to-use-tiktoken-in-offline-mode-computer

I found this

That solution works. Tested it myself.

Thank you for finding it.