Added tiktoken cache support, added script to load encoding file

aerjenn commented 5 months ago

This pull request primarily focuses on enabling the simulated API to operate in environments with restricted network access. It introduces a caching mechanism for the TikToken encoding file, which was previously retrieved from a public blob storage account. The changes also include the addition of a new Python script to handle the setup of the TikToken cache and updates to the README to provide instructions on using the simulator with restricted network access.

Key changes include:

README Updates:

README.md: A new section titled "Using the simulator with restricted network access" has been added. This section explains how the simulator can be used in an environment with restricted network access by caching the TikToken encoding file. It also provides setup instructions for this process.
README.md: A new environment variable USE_TIKTOKEN_CACHE has been added to the list of environment variables. This variable, when set to True, allows the simulator to use a cached TikToken encoding file instead of retrieving it through the public internet.
README.md: A new entry has been added to the table of contents to reflect the new section on using the simulator with restricted network access.

Codebase Updates:

scripts/setup_tiktoken.py: A new Python script has been added. This script creates a tiktoken_cache folder and downloads the TikToken encoding file from a public URL, storing it in the cache folder.
src/aoai-simulated-api/src/aoai_simulated_api/config.py: Changes have been made to support the use of the USE_TIKTOKEN_CACHE environment variable and the setup of the TikToken cache. If USE_TIKTOKEN_CACHE is set to True, the setup_tiktoken_cache function is called. This function checks for the existence of the TikToken encoding file in the cache directory and sets the TIKTOKEN_CACHE_DIR environment variable. [1] [2] [3]

stuartleeks commented 5 months ago

Thanks for this @aerjenn - this would be a great addition!

The flow in the PR as I understand it is:

run setup_tiktoken.py to download the content into the src/aoai-simulated-api/src/aoai_simulated_api folder
set the USE_TIKTOKEN_CACHE env var when running the simulator to pick up the downloaded file

I'm not keen on downloading the content into the source folder (although I'm not really sure why!), so was thinking maybe we could tweak the flow

run setup_tiktoken.py to download the content into a cache folder
set TIKTOKEN_CACHE_DIR env var to point to the cache location when running the simulator. I think we then don't need the config.py changes?

For running via Docker, we could consider add an additional Dockerfile that runs setup_tiktoken.py and to save the content into the image, and also sets the TIKTOKEN_CACHE_DIR so that the resulting image has the cached content and is ready to run.

What do you think?

stuartleeks commented 5 months ago

(Either way, ensuring that we .gitignore the downloaded content would be good 😄 )

stuartleeks / aoai-simulated-api

Added tiktoken cache support, added script to load encoding file #12