openai / tiktoken

tiktoken is a fast BPE tokeniser for use with OpenAI's models.
MIT License
11.16k stars 751 forks source link

Support enterprise network support for self-hosted encodings (UPDATED v0.6.0) #238

Open blaney83 opened 6 months ago

blaney83 commented 6 months ago

Enterprises with internal networking configurations may want to limit access to outside resources for enterprise applications. Currently, tiktoken requires network access to openaipublic.blob.core.windows.net to populate the plugin modules which is adequate for most public/individual applications, but can cause problems at the organizational level.

To address this, tiktoken should support an environmental override for users who want to internally host their own encodings files.

This PR adds the required environmental parameterization to the tiktoken_ext script and updates the README.md to explain its usage.

blaney83 commented 4 months ago

@hauntsaninja can you shine some light on where the contribution process stands for tiktoken? We currently have some workarounds at the enterprise level to which stakeholder would like to assign timelines. Thanks.