Open Emasoft opened 11 months ago
Any update on this? Maybe some devs at OpenAI are underestimating the importance of Tiktoken in the OpenAI ecosystem. Every small tool accessing GPT have to use this. It is a key element that should run on EVERY platform, including in-browsers python interpreters and headless VMs/Dockers with severe restrictions on compiled binaries. Pure Python is perfect for such universal portability, but the mandatory Rust binary in Tiktoken makes this key element to stop being cross platform as a true Python program should be, and to become a troubling stumbling block instead for many devs. Please consider this issue. Thanks. 🙏
The Rust dependency, which is only needed for this one (minor) library greatly increases the image size. It's almost ridiculous how large it is; quit unusual for a Python library.
This should have a pure Python implementation by default and provide a tiktoken[fast]
or tiktoken[rust]
extra which introduces the Rust variant.
Currently Tiktoken (and with it all the OpenAI related python libraries using it) cannot be installed on systems and platforms that cannot (or are forbidden to) install Rust. This is a big issue, and many times it was rised here.
See:
36
57
94
134
josephrocca/gpt-2-3-tokenizer#2 pyodide/pyodide#3875 pyodide/pyodide#3663 pyodide/pyodide#3543 emscripten-forge/recipes#660 psymbio/tiktoken_rust_wasm https://github.com/openai/tiktoken/issues/94#issuecomment-1773748693
There are already 2 pure python implementations of the tokenizer:
In the educational version: https://github.com/openai/tiktoken/blob/main/tiktoken/_educational.py In this fork, courtesy of @kechan: https://github.com/kechan/tiktoken As discussed here: https://github.com/openai/tiktoken/issues/36
Since everything is in place, the solution would be simple: If the pip installer doesn't find Rust, it should install the pure python version of the tokenizer. Please consider it. Making Rust mandatory to use OpenAI api it's inconvenient and only making the API accessible to less users and companies. It is in the best interest of OpenAI make its tools as portable as possible, and Python it's the perfect language for this. Thanks!